On non-parametric robust quantile regression by support vector machines

Similar documents
Robust Support Vector Machines for Probability Distributions

Approximation Theoretical Questions for SVMs

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Consistency and robustness of kernel-based regression in convex risk minimization

Consistency and robustness of kernel-based regression in convex risk minimization

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Strictly Positive Definite Functions on a Real Inner Product Space

RegML 2018 Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Approximate Kernel PCA with Random Features

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Convergence Rates of Kernel Quadrature Rules

Gaps in Support Vector Optimization

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Methoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft

TUM 2016 Class 1 Statistical learning theory

Quantile Processes for Semi and Nonparametric Regression

Strictly positive definite functions on a real inner product space

Generalization theory

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Kernel methods and the exponential family

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Approximate Kernel Methods

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

Hilbert Space Embedding of Probability Measures

Does Modeling Lead to More Accurate Classification?

A Study of Relative Efficiency and Robustness of Classification Methods

BINARY CLASSIFICATION

Mathematical Methods for Data Analysis

Bits of Machine Learning Part 1: Supervised Learning

Efficient Complex Output Prediction

Learning with kernels and SVM

Stochastic optimization in Hilbert spaces

arxiv: v2 [stat.ml] 10 Jul 2017

A Bahadur Representation of the Linear Support Vector Machine

Recovering Distributions from Gaussian RKHS Embeddings

Statistical Properties of Large Margin Classifiers

Statistical Convergence of Kernel CCA

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

Classification and statistical machine learning

Interval-valued regression and classification models in the framework of machine learning

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Bayesian Regularization

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Method for Multivariate Density Estimation

CIS 520: Machine Learning Oct 09, Kernel Methods

Kernel Learning via Random Fourier Representations

arxiv: v1 [stat.ml] 19 Mar 2017

Polyhedral Computation. Linear Classifiers & the SVM

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

The Learning Problem and Regularization

Kernel methods and the exponential family

Kernel methods for Bayesian inference

Minimax Estimation of Kernel Mean Embeddings

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

Hilbert Space Methods in Learning

Online Gradient Descent Learning Algorithms

Optimal Rates for Regularized Least Squares Regression

COMS 4771 Introduction to Machine Learning. Nakul Verma

Maximum Mean Discrepancy

Distribution Regression

Support Vector Machine

SMO Algorithms for Support Vector Machines without Bias Term

Support Vector Machines

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT

Support Vector Machines

Support Vector Machine Regression for Volatile Stock Market Prediction

INDEPENDENCE MEASURES

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

1. Mathematical Foundations of Machine Learning

Quantiles symmetry. Keywords: Quantile function, distribution function, symmetry, equivariance

Classifier Complexity and Support Vector Classifiers

Semi-Supervised Learning through Principal Directions Estimation

Support Vector Machines. Maximizing the Margin

Curve learning. p.1/35

Reproducing Kernel Hilbert Spaces

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes

On-line Support Vector Machines and Optimization Strategies

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

An introduction to Support Vector Machines

An Introduction to Kernel Methods 1

References for online kernel methods

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Sparseness of Support Vector Machines

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Kernel Method: Data Analysis with Positive Definite Kernels

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

Statistical Properties and Adaptive Tuning of Support Vector Machines

Machine Learning And Applications: Supervised Learning-SVM

Kernel Methods. Outline

Ad Placement Strategies

Nonparametric Quantile Estimation

Metric Embedding for Kernel Classification Rules

Transcription:

On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM 2008, Neuchâtel, Switzerland, June 19-21, 2008 On non-parametric robust quantile regression by support vector machines 1

Example linear QR logratio 1.2 0.8 0.4 0.0 400 450 500 550 600 650 700 range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 2

Example SVM for quantile regression logratio 1.2 0.8 0.4 0.0 400 450 500 550 600 650 700 range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 3

Nonparametric Quantile Regression Assumptions: X = complete measurable space, e.g. X = R d Y R closed, Y = D = D n = (z 1,..., z n ), z i := (x i, y i ) Z := X Y, D := 1 n n i=1 δ (x i,y i ) (X i, Y i ) i.i.d. P M 1, P (totally) unknown Goal: estimate quantile function f α,p(x) = inf {q Y; P(Y q X = x) α}, x X Assumption: f α,p unique If f α,p is linear: Koenker & Bassett 78, Koenker 05 On non-parametric robust quantile regression by support vector machines 4

Loss Function and Risk Pinball loss function: { (α 1)(y t), (y t) < 0 L α (y, t) := α(y t), (y t) 0 alpha=0.1 alpha=0.75 Loss of y t 0 2 4 6 8 Loss of y t 0 2 4 6 10 5 0 5 10 10 5 0 5 10 Risk: R Lα,P(f) := E P L α ( Y, f(x) ), y t y t P M1 On non-parametric robust quantile regression by support vector machines 5

Support Vector Machine approach Schölkopf et al 00 and Takeuchi et al. 06 proposed: f D,λ = arg min f H 1 n n ( L α Yi, f(x i ) ) + λ f 2 H, i=1 where H is a reproducing kernel Hilbert space (RKHS) and λ > 0. S(P) = f P,λ = arg min f H E PL α ( Y, f(x) ) + λ f 2 H, P M 1, On non-parametric robust quantile regression by support vector machines 6

Kernel k is kernel: k : X X K, if K-Hilbert space H and Φ : X H such that k(x, x ) = Φ(x ), Φ(x) H, x, x X H is reproducing kernel Hilbert space (RKHS): x X : δ x (f) := f(x), f H, is continuous reproducing kernel: k(, x) H, x X, and f(x) = f, k(, x) H, f H, x X Φ is canonical feature map: Φ(x) = k(, x), x X bounded: k := sup x X k(x, x) < GRBF: k(x, x ) = e γ x x 2 2, γ > 0 On non-parametric robust quantile regression by support vector machines 7

Consistency f D,λn is risk consistent, if R Lα,P(f D,λn ) P R L α,p := inf f:x R measurable R L α,p(f) (2.1) R L α,p,h := inf f H R L α,p(f) = R L α,p (2.2) Large RKHS (CHR & Steinwart 08) Let H be the RKHS of a bounded kernel k : X X R and µ be a distribution on X. Then the following statements are equivalent: 1 H is dense in L 1 (µ). 2 (2.2) holds for all P M 1 with P X = µ and E P Y <. On non-parametric robust quantile regression by support vector machines 8

Bounded L-risk (CHR & Steinwart 08) Assume: P M 1 with E P Y <, f : X R with f L 1 (P). Then: R Lα,P(f) <. On non-parametric robust quantile regression by support vector machines 9

Existence and uniqueness (CHR & Steinwart 08) Assume: P M 1 with E P Y <, H RKHS of bounded kernel k, λ > 0. Then: 1 exists unique minimizer S(P) = f P,λ H 2 f P,λ H R Lα,P(0)/λ. On non-parametric robust quantile regression by support vector machines 10

Consistency (Steinwart & CHR 08a) Assume: H separable RKHS of a bounded measurable kernel k such that H is dense in L 1 (µ) for all distributions µ on X (λ n ) n N with λ n 0. If λ 2 nn, then for all P M 1, E P Y < : 1 R Lα,P(f D,λn ) P R L α,p 2 f D,λn f α,p L 0 (P X ) P 0 If δ > 0 and λ 2+δ n n, then for all P M 1, E P Y < : 3 R Lα,P(f D,λn ) a.s. R L α,p a.s. 0 4 f D,λn f α,p L 0 (P X ) On non-parametric robust quantile regression by support vector machines 11

Rate of convergence 1 No-free-lunch theorem: no uniform rate of convergence! [Devroye 82] 2 Under many assumptions we have [Steinwart & CHR 08b] f D,λn f α,p L (P X ) c n 1/3 On non-parametric robust quantile regression by support vector machines 12

Robustness What is the impact on S(P) or S(P n ) due to violations from (X i, Y i ) i.i.d. P, P M 1 unknown? On non-parametric robust quantile regression by support vector machines 13

Bias, maxbias, sensitivity curve (CHR & Steinwart 07) Assume: E P Y < and E P Y <, (can be weakend) H RKHS of continuous and bounded kernel k, λ > 0, ε > 0. Then we have with c := λ 1 k max{α, 1 α} 1 Bias: f (1 ε)p+ε P,λ f P,λ H c P P tv ε 2 Maxbias: sup Q Nε(P) f Q,λ f P,λ H 2 c ε 3 Sensitivity curve: SC n (z; S n ) H 2c, z X Y On non-parametric robust quantile regression by support vector machines 14

On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 15 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε(q P) 0 S ( P + ε(q P) ) S(P) BIF(Q; S, P) H ε Q P = 0.

On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 16 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε 0 S ( (1 ε)p + εq ) S(P) BIF(Q; S, P) H ε = 0.

Bounded BIF (CHR & Van Messem 08) Assume: k bounded and measurable E P Y <, E Q Y < (can be weakend) δ > 0 positive constants ξ P, ξ Q, c P, and c Q such that t R with t f P,λ (x) δ k the following inequalities hold a [0, 2δ k ] and x X : Then: P ( Y [t, t+a] x ) c P a 1+ξ P, Q ( Y [t, t+a] x ) c Q a 1+ξ Q. BIF(Q; S, P) with S(P) := f P,λ 1 exists ( ( 2 1 equals 2λ X P Y fp,λ (x) ) ) x α Φ(x) dpx (x) ( ( 1 2λ X Q Y fp,λ (x) ) ) x α Φ(x) dqx (x) 3 bounded. On non-parametric robust quantile regression by support vector machines 17

Conclusions Non-parametric quantile regression by SVMs 1 exists a unique solution 2 consistent L α -risk of f D,λn converges to Bayes risk (in P) f D,λn converges to true quantile function (in P or a.s.) rate of convergence 3 robust if kernel bounded Bouligand-IF, sensitivity curve, maxbias are bounded 4 computable for large high-dimensional data sets On non-parametric robust quantile regression by support vector machines 18

References CHR & Steinwart (2007). Bernoulli. CHR & Steinwart (2008). Appl. Stochastic Models Bus. Ind. CHR & Van Messem (2008). J. Mach. Learn. Res. Koenker (2005). Quantile Regression. Cambridge U.P. Koenker & Bassett (1978). Econometrica Schölkopf & Smola (2002). Learning with Kernels. MIT Press. Steinwart & CHR (2008). Support Vector Machines. Springer. Steinwart & Chr (2008b). Advances in Neural Information Processing Systems, 20, 305-312. Takeuchi, Le, Sears, Smola (2006). J. Mach. Learn. Res. On non-parametric robust quantile regression by support vector machines 19

Appendix: Example nonparametric QRSS logratio 1.2 0.8 0.4 0.0 400 450 500 550 600 650 700 range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 [R-package quantreg, rqss(logratio qss(range,constraint= N,lambda=25), tau=0.5)] On non-parametric robust quantile regression by support vector machines 20