Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm: SVM+

1 Proof and Implementation of Algorithmic Realization of earning Using Privileged Information (UPI Paradigm: SVM+ Z. Berkay Celik, Rauf Izmailov, and Patrick McDaniel Department of Computer Science and Engineering The Pennsylvania State University Email: {zbc102,mcdaniel}@cse.psu.edu Applied Communication Sciences, Basking Ridge, NJ, USA Email: rizmailov@appcomsci.com Abstract Vapnik et al. recently introduced a new learning paradigm called earning Using Privileged Information (UPI. In this paradigm, along with standard training data, the teacher provides the student privileged (additional information, yet not available at test time. The paradigm is realized by implementation of SVM+ algorithm. In this report, we give the proof of the SVM+ algorithm and show implementation details in MATAB quadratic programming (quadprog( function provided by the optimization toolbox. This allows us to obtain a vector x that minimizes the quadratic function. I. INTRODUCTION Vapnik et al. recently introduced earning Using Privileged Information (UPI paradigm that aims at improving the predictive performance of learning algorithms and reducing the amount of required training data [1], [2], [3]. The algorithmic realization of UPI paradigm is named as SVM+ algorithm. SVM+ allows generating a function f ( that is learned from both standard training data and privileged information but requires only standard training data to make predictions at test time. The privileged information available at training time is utilized to reduce the error measure on the prediction. SVM+ algorithm has recently attracted a lot of attention in machine learning community and started using in a few research areas. Recently, Heng et al. [4] applied the paradigm to Institute of Networking and Security Research (INSR Technical Report, 20 December 2015, NAS-TR-0187-2015

2 facial feature detection where head poses or gender available only in training time; Sharmanska et al. [5] employed to the image classification by providing additional description of images in training time. Hernandez-obato et al. treated learning with privileged information paradigm under the Gaussian process classification (GPC framework [6]. Finally, opez-paz et al. generalized the idea of distillation [7] by using privileged information and presented applications in semisupervised, unsupervised and multitask learning case studies [8]. We describe the goal of UPI paradigm as follows. We can formally split the feature space into two spaces at training time: given standard vectors X std = x 1,..., x and privileged vectors X = x 1,..., x with a target class y = {+1, 1}, where x i R N and x i R M for all i = 1,...,. Then, the goal is finding the best function f x std : y such that the expected loss is: R(θ = X std Y ((x std,θ,yp(x std,x,ydx std dy (1 where ( f (x,θ,y is the loss function and θ is the parameters. In this report, we give the proof of the SVM+ algorithm with two spaces in Section II and provide the implementation in MATAB quadprog( function [9] in Section III. II. FORMUATION OF SVM+ AGORITHM WITH TWO SPACES Given standard vectors x 1,..., x and privileged vectors x 1,..., x with a target class y = {+1, 1}, where x i R N and x i RM for all i = 1,...,. The kernels K( x i, x j and K ( x i, x j are selected, along with positive parameters κ, γ. The two-spaces SVM+ optimization problem is formulated as α i 1 2 α i y i = 0, i, j=1y i y j α i α j K( x i, x j γ 2 δ i y i = 0 0 α i κc i, 0 δ i C i, i = 1,..., The decision rule f for vector z is as follows: f ( z = sign y i y j (α i δ i (α j δ j K ( x i, x j max i, j=1 ( y i α i K( x i, z + B where in order to compute B we first have to compute the agrangian of (2. (2, (3

3 The agrangian of (2 has the form ( α, β, φ, λ, µ, ν, ρ = i α 1 2 y i y j α i α j K( x i, x j i, j=1 γ 2 y i y j (α i δ i (α j δ j K ( x i, x j + i, j=1 φ 1 α i y i + φ 2 i y i + δ λ i α i + µ i (κc i α i + with Karush-Kuhn-Tucker (KKT conditions (for each i = 1,..., = K( x i, x i α i γk ( x α i, x i α i + K ( x i, xi γδ i i k i γ K ( x i, xk y iy k (α k δ k + 1 + φ 1 y i + λ i µ i = 0 k i = K ( x δ i, xi γδ i + K ( x i, xi γα i + i k i λ i 0, µ i 0, ν i 0, ρ i 0, ν i δ i + K( x i, x k y i y k α k ρ i (C i δ i K( x i, x k y i y k γ(α k δ k + φ 2 y i + ν i ρ i = 0 (4 (5 λ i α i = 0, µ i (C i α i = 0, ν i δ i = 0, ρ i (C i δ i = 0 α i y i = 0, δ i y i = 0 We denote F i = K( x i, x k y k α k, f i = K ( x i, xk y k(α k δ k i = 1,..., (6 k=1 k=1 and rewrite (5 in the form = y i F i γy i f i + 1 + φ 1 y i + λ i µ i = 0 α i = γy i f i + φ 2 y y + ν i ρ i = 0 δ i λ i 0, µ i 0, ν i 0, ρ i 0, λ i α i = 0, µ i (C i α i = 0, ν i δ i = 0, ρ i (C i δ i = 0 (7 α i y i = 0, δ i y i = 0

4 The first equation in (7 implies φ 1 = y j (1 y j F j γy j f j + λ j µ j (8 for all j. If j is selected such that 0 < α j < κc j and 0 < δ j < C j, then (7 implies λ j = µ j = ν j = ρ j = 0 and (8 has the form φ 1 = y j (1 y j F j γy j f j = y j (1 Therefore, B is computed as B = φ 1 : B = y j (1 y i y j K( x i, x j (α i γ where j is such that 0 < α j < κc j and 0 < δ j < C j. y i y j K( x i, x j (α i γ y i y j K ( x i, x j (α i δ i y i y j K ( x i, x j (α i δ i (9 (10 III. MATAB QUADRATIC PROGRAMMING FOR SVM+ AGORITHM WITH TWO SPACES Matlab function quadprog(h, f, A, b, Aeq, beq, lb, ub solves the quadratic programming (QP problem in the form as follows: 1 2 zt H z + f T z min A z b Aeq z = beq lb z ub Here, H,A,Aeq are matrices, and f, b, beq, lb, ub are vectors. (11 where z = (α 1,...,α,δ 1,...,δ R 2. We now rewrite (2 in the form (11. The first line of (11 corresponds to the first line of (2 when written as where 1 2 zt H z + f T z min f = ( 1 1, 1 2,..., 1,0 +1,...,0 2 and H i j = H11 H 12 H 12 H 22

5 where, for each pair i, j = 1,...,, H 11 i j = K( x i, x j y i y j + γk ( x i, x j y iy j, H 12 i j = γk ( x i, x j y iy j, H 22 i j = +γk ( x i, x j y iy j. The second line of (11 is absent. The third line of (11 corresponds to the second line of (2 when written as Aeq z = beq where Aeq = y 1 y 2 y 0 0 0 0 0 0 y 1 y 2 y, beq = 0 0 The fourth line of (11 corresponds to the third line of (2 when written as lb z ub where lb = (0 1,0 2,...,0,0 +1,...,0 2, ub = (κc 1,κC 2,...,κC,C 1,C 2,...,C After all variables of quadprog(h, f,a, b,aeq, beq, lb, ub are defined as above, optimization toolbox guide [9] can be used to select quadprog( function options such as an optimization algorithm. Then, output of the function can be used in decision rule f for a new sample z to make predictions as presented in (3 as follows: f ( z = sign ( y i α i K( x i, z + B (12 IV. CONCUSION In this report, we give the proof of SVM+ algorithm and describe the implementation in MATAB quadprog( function. Given standard training data and privileged information along with the selected kernel and parameters, the function returns the value of the objective function that can be used to make predictions on unseen samples by using standard training data.

6 REFERENCES [1] Vladimir Vapnik and Akshay Vashist. A new learning paradigm: earning using privileged information. Neural Networks, 2009. [2] Dmitry Pechyony, Rauf Izmailov, Akshay Vashist, and Vladimir Vapnik. Smo-style algorithms for learning using privileged information. In Proc. International Conference on Data Mining (DMIN, 2010. [3] Vladimir Vapnik and Rauf Izmailov. earning with intelligent teacher: similarity control and knowledge transfer. In Statistical earning and Data Sciences. Springer, 2015. [4] Heng Yang and Ioannis Patras. Privileged information-based conditional regression forest for facial feature detection. In Proc. International Conference on Automatic Face and Gesture Recognition. IEEE, 2013. [5] V. Sharmanska, N. Quadrianto, and C. H. ampert. earning to rank using privileged information. In Proc. International Conference on Computer Vision (ICCV. IEEE, 2013. [6] Daniel Hernández-obato, Viktoriia Sharmanska, Kristian Kersting, Christoph H ampert, and Novi Quadrianto. Mind the nuisance: Gaussian process classification using privileged noise. In Advances in Neural Information Processing Systems, pages 837 845, 2014. [7] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arxiv preprint arxiv:1503.02531, 2015. [8] David opez-paz, éon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. Unifying distillation and privileged information. arxiv preprint arxiv:1511.03643, 2015. [9] MATAB documentation of quadprog( function. http://www.mathworks.com/help/optim/ug/quadprog.html. [Online; accessed 19-December-2015].