On non-parametric robust quantile regression by support vector machines

Size: px

Start display at page:

Download "On non-parametric robust quantile regression by support vector machines"

Junior Hutchinson
5 years ago
Views:

1 On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM 2008, Neuchâtel, Switzerland, June 19-21, 2008 On non-parametric robust quantile regression by support vector machines 1

2 Example linear QR logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 2

3 Example SVM for quantile regression logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 3

4 Nonparametric Quantile Regression Assumptions: X = complete measurable space, e.g. X = R d Y R closed, Y = D = D n = (z 1,..., z n ), z i := (x i, y i ) Z := X Y, D := 1 n n i=1 δ (x i,y i ) (X i, Y i ) i.i.d. P M 1, P (totally) unknown Goal: estimate quantile function f α,p(x) = inf {q Y; P(Y q X = x) α}, x X Assumption: f α,p unique If f α,p is linear: Koenker & Bassett 78, Koenker 05 On non-parametric robust quantile regression by support vector machines 4

5 Loss Function and Risk Pinball loss function: { (α 1)(y t), (y t) < 0 L α (y, t) := α(y t), (y t) 0 alpha=0.1 alpha=0.75 Loss of y t Loss of y t Risk: R Lα,P(f) := E P L α ( Y, f(x) ), y t y t P M1 On non-parametric robust quantile regression by support vector machines 5

6 Support Vector Machine approach Schölkopf et al 00 and Takeuchi et al. 06 proposed: f D,λ = arg min f H 1 n n ( L α Yi, f(x i ) ) + λ f 2 H, i=1 where H is a reproducing kernel Hilbert space (RKHS) and λ > 0. S(P) = f P,λ = arg min f H E PL α ( Y, f(x) ) + λ f 2 H, P M 1, On non-parametric robust quantile regression by support vector machines 6

7 Kernel k is kernel: k : X X K, if K-Hilbert space H and Φ : X H such that k(x, x ) = Φ(x ), Φ(x) H, x, x X H is reproducing kernel Hilbert space (RKHS): x X : δ x (f) := f(x), f H, is continuous reproducing kernel: k(, x) H, x X, and f(x) = f, k(, x) H, f H, x X Φ is canonical feature map: Φ(x) = k(, x), x X bounded: k := sup x X k(x, x) < GRBF: k(x, x ) = e γ x x 2 2, γ > 0 On non-parametric robust quantile regression by support vector machines 7

8 Consistency f D,λn is risk consistent, if R Lα,P(f D,λn ) P R L α,p := inf f:x R measurable R L α,p(f) (2.1) R L α,p,h := inf f H R L α,p(f) = R L α,p (2.2) Large RKHS (CHR & Steinwart 08) Let H be the RKHS of a bounded kernel k : X X R and µ be a distribution on X. Then the following statements are equivalent: 1 H is dense in L 1 (µ). 2 (2.2) holds for all P M 1 with P X = µ and E P Y <. On non-parametric robust quantile regression by support vector machines 8

9 Bounded L-risk (CHR & Steinwart 08) Assume: P M 1 with E P Y <, f : X R with f L 1 (P). Then: R Lα,P(f) <. On non-parametric robust quantile regression by support vector machines 9

10 Existence and uniqueness (CHR & Steinwart 08) Assume: P M 1 with E P Y <, H RKHS of bounded kernel k, λ > 0. Then: 1 exists unique minimizer S(P) = f P,λ H 2 f P,λ H R Lα,P(0)/λ. On non-parametric robust quantile regression by support vector machines 10

11 Consistency (Steinwart & CHR 08a) Assume: H separable RKHS of a bounded measurable kernel k such that H is dense in L 1 (µ) for all distributions µ on X (λ n ) n N with λ n 0. If λ 2 nn, then for all P M 1, E P Y < : 1 R Lα,P(f D,λn ) P R L α,p 2 f D,λn f α,p L 0 (P X ) P 0 If δ > 0 and λ 2+δ n n, then for all P M 1, E P Y < : 3 R Lα,P(f D,λn ) a.s. R L α,p a.s. 0 4 f D,λn f α,p L 0 (P X ) On non-parametric robust quantile regression by support vector machines 11

12 Rate of convergence 1 No-free-lunch theorem: no uniform rate of convergence! [Devroye 82] 2 Under many assumptions we have [Steinwart & CHR 08b] f D,λn f α,p L (P X ) c n 1/3 On non-parametric robust quantile regression by support vector machines 12

13 Robustness What is the impact on S(P) or S(P n ) due to violations from (X i, Y i ) i.i.d. P, P M 1 unknown? On non-parametric robust quantile regression by support vector machines 13

14 Bias, maxbias, sensitivity curve (CHR & Steinwart 07) Assume: E P Y < and E P Y <, (can be weakend) H RKHS of continuous and bounded kernel k, λ > 0, ε > 0. Then we have with c := λ 1 k max{α, 1 α} 1 Bias: f (1 ε)p+ε P,λ f P,λ H c P P tv ε 2 Maxbias: sup Q Nε(P) f Q,λ f P,λ H 2 c ε 3 Sensitivity curve: SC n (z; S n ) H 2c, z X Y On non-parametric robust quantile regression by support vector machines 14

15 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 15 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε(q P) 0 S ( P + ε(q P) ) S(P) BIF(Q; S, P) H ε Q P = 0.

16 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 16 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε 0 S ( (1 ε)p + εq ) S(P) BIF(Q; S, P) H ε = 0.

17 Bounded BIF (CHR & Van Messem 08) Assume: k bounded and measurable E P Y <, E Q Y < (can be weakend) δ > 0 positive constants ξ P, ξ Q, c P, and c Q such that t R with t f P,λ (x) δ k the following inequalities hold a [0, 2δ k ] and x X : Then: P ( Y [t, t+a] x ) c P a 1+ξ P, Q ( Y [t, t+a] x ) c Q a 1+ξ Q. BIF(Q; S, P) with S(P) := f P,λ 1 exists ( ( 2 1 equals 2λ X P Y fp,λ (x) ) ) x α Φ(x) dpx (x) ( ( 1 2λ X Q Y fp,λ (x) ) ) x α Φ(x) dqx (x) 3 bounded. On non-parametric robust quantile regression by support vector machines 17

18 Conclusions Non-parametric quantile regression by SVMs 1 exists a unique solution 2 consistent L α -risk of f D,λn converges to Bayes risk (in P) f D,λn converges to true quantile function (in P or a.s.) rate of convergence 3 robust if kernel bounded Bouligand-IF, sensitivity curve, maxbias are bounded 4 computable for large high-dimensional data sets On non-parametric robust quantile regression by support vector machines 18

19 References CHR & Steinwart (2007). Bernoulli. CHR & Steinwart (2008). Appl. Stochastic Models Bus. Ind. CHR & Van Messem (2008). J. Mach. Learn. Res. Koenker (2005). Quantile Regression. Cambridge U.P. Koenker & Bassett (1978). Econometrica Schölkopf & Smola (2002). Learning with Kernels. MIT Press. Steinwart & CHR (2008). Support Vector Machines. Springer. Steinwart & Chr (2008b). Advances in Neural Information Processing Systems, 20, Takeuchi, Le, Sears, Smola (2006). J. Mach. Learn. Res. On non-parametric robust quantile regression by support vector machines 19

20 Appendix: Example nonparametric QRSS logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 [R-package quantreg, rqss(logratio qss(range,constraint= N,lambda=25), tau=0.5)] On non-parametric robust quantile regression by support vector machines 20

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,