Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm: SVM+

Similar documents
Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Formulation with slack variables

Introduction to Support Vector Machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Support Vector Machines: Maximum Margin Classifiers

Learning with kernels and SVM

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support Vector Machine

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

Support Vector Machines

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Introduction to Support Vector Machines

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Does Unlabeled Data Help?

Machine Learning. Support Vector Machines. Manfred Huber

SMO Algorithms for Support Vector Machines without Bias Term

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

Supervised Machine Learning: Learning SVMs and Deep Learning. Klaus-Robert Müller!!et al.!!

An introduction to Support Vector Machines

Lecture Notes on Support Vector Machine

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

TECHNICAL REPORT. No. CCLS Title: A Fractional Programming Framework for Support Vector Machine-type Formulations Authors: Ilia Vovsha

Support Vector Machines

RECENTLY, Vapnik and Vashist [1] provided a new learning. A New Learning Paradigm for Random Vector Functional-Link Network: RVFL+

Discriminative Direction for Kernel Classifiers

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

On Tolerant Fuzzy c-means Clustering with L 1 -Regularization

Linear Dependency Between and the Input Noise in -Support Vector Regression

Links between Perceptrons, MLPs and SVMs

Support Vector Machine Regression for Volatile Stock Market Prediction

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

A Tutorial on Support Vector Machine

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Incorporating detractors into SVM classification

Support Vector Machines

CS 6375 Machine Learning

Statistical Machine Learning from Data

Probabilistic Regression Using Basis Function Models

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Least Squares SVM Regression

Classifier Complexity and Support Vector Classifiers

1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines

Support Vector Machine I

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

Brief Introduction of Machine Learning Techniques for Content Analysis

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

Jeff Howbert Introduction to Machine Learning Winter

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

Support Vector Machines and Kernel Methods

Perceptron Revisited: Linear Separators. Support Vector Machines

Teaching Semi-Supervised Classifier via Generalized Distillation

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

SUPPORT VECTOR MACHINE

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine

Support Vector Machine for Classification and Regression

Support Vector Machine

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Non-linear Support Vector Machines

Support Vector Machines

Constrained Optimization and Support Vector Machines

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machine (continued)

Recent Advances in Bayesian Inference Techniques

Support Vector Machine (SVM) and Kernel Methods

ON SIMPLE ONE-CLASS CLASSIFICATION METHODS

L5 Support Vector Classification

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Kernels for Multi task Learning

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines. Maximizing the Margin

Machine learning for pervasive systems Classification in high-dimensional spaces

Support Vector Machines

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

T Assignment-08/2011

Support Vector Machines

Preliminary results of a learning algorithm to detect stellar ocultations Instituto de Astronomia UNAM. Dr. Benjamín Hernández & Dr.

Holdout and Cross-Validation Methods Overfitting Avoidance

Support Vector Machines Explained

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning : Support Vector Machines

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Intelligent Systems I

Advanced Topics in Machine Learning, Summer Semester 2012

Polyhedral Computation. Linear Classifiers & the SVM

Square Penalty Support Vector Regression

Support Vector Ordinal Regression using Privileged Information

Discriminative Learning and Big Data

CS145: INTRODUCTION TO DATA MINING

Transcription:

1 Proof and Implementation of Algorithmic Realization of earning Using Privileged Information (UPI Paradigm: SVM+ Z. Berkay Celik, Rauf Izmailov, and Patrick McDaniel Department of Computer Science and Engineering The Pennsylvania State University Email: {zbc102,mcdaniel}@cse.psu.edu Applied Communication Sciences, Basking Ridge, NJ, USA Email: rizmailov@appcomsci.com Abstract Vapnik et al. recently introduced a new learning paradigm called earning Using Privileged Information (UPI. In this paradigm, along with standard training data, the teacher provides the student privileged (additional information, yet not available at test time. The paradigm is realized by implementation of SVM+ algorithm. In this report, we give the proof of the SVM+ algorithm and show implementation details in MATAB quadratic programming (quadprog( function provided by the optimization toolbox. This allows us to obtain a vector x that minimizes the quadratic function. I. INTRODUCTION Vapnik et al. recently introduced earning Using Privileged Information (UPI paradigm that aims at improving the predictive performance of learning algorithms and reducing the amount of required training data [1], [2], [3]. The algorithmic realization of UPI paradigm is named as SVM+ algorithm. SVM+ allows generating a function f ( that is learned from both standard training data and privileged information but requires only standard training data to make predictions at test time. The privileged information available at training time is utilized to reduce the error measure on the prediction. SVM+ algorithm has recently attracted a lot of attention in machine learning community and started using in a few research areas. Recently, Heng et al. [4] applied the paradigm to Institute of Networking and Security Research (INSR Technical Report, 20 December 2015, NAS-TR-0187-2015

2 facial feature detection where head poses or gender available only in training time; Sharmanska et al. [5] employed to the image classification by providing additional description of images in training time. Hernandez-obato et al. treated learning with privileged information paradigm under the Gaussian process classification (GPC framework [6]. Finally, opez-paz et al. generalized the idea of distillation [7] by using privileged information and presented applications in semisupervised, unsupervised and multitask learning case studies [8]. We describe the goal of UPI paradigm as follows. We can formally split the feature space into two spaces at training time: given standard vectors X std = x 1,..., x and privileged vectors X = x 1,..., x with a target class y = {+1, 1}, where x i R N and x i R M for all i = 1,...,. Then, the goal is finding the best function f x std : y such that the expected loss is: R(θ = X std Y ((x std,θ,yp(x std,x,ydx std dy (1 where ( f (x,θ,y is the loss function and θ is the parameters. In this report, we give the proof of the SVM+ algorithm with two spaces in Section II and provide the implementation in MATAB quadprog( function [9] in Section III. II. FORMUATION OF SVM+ AGORITHM WITH TWO SPACES Given standard vectors x 1,..., x and privileged vectors x 1,..., x with a target class y = {+1, 1}, where x i R N and x i RM for all i = 1,...,. The kernels K( x i, x j and K ( x i, x j are selected, along with positive parameters κ, γ. The two-spaces SVM+ optimization problem is formulated as α i 1 2 α i y i = 0, i, j=1y i y j α i α j K( x i, x j γ 2 δ i y i = 0 0 α i κc i, 0 δ i C i, i = 1,..., The decision rule f for vector z is as follows: f ( z = sign y i y j (α i δ i (α j δ j K ( x i, x j max i, j=1 ( y i α i K( x i, z + B where in order to compute B we first have to compute the agrangian of (2. (2, (3

3 The agrangian of (2 has the form ( α, β, φ, λ, µ, ν, ρ = i α 1 2 y i y j α i α j K( x i, x j i, j=1 γ 2 y i y j (α i δ i (α j δ j K ( x i, x j + i, j=1 φ 1 α i y i + φ 2 i y i + δ λ i α i + µ i (κc i α i + with Karush-Kuhn-Tucker (KKT conditions (for each i = 1,..., = K( x i, x i α i γk ( x α i, x i α i + K ( x i, xi γδ i i k i γ K ( x i, xk y iy k (α k δ k + 1 + φ 1 y i + λ i µ i = 0 k i = K ( x δ i, xi γδ i + K ( x i, xi γα i + i k i λ i 0, µ i 0, ν i 0, ρ i 0, ν i δ i + K( x i, x k y i y k α k ρ i (C i δ i K( x i, x k y i y k γ(α k δ k + φ 2 y i + ν i ρ i = 0 (4 (5 λ i α i = 0, µ i (C i α i = 0, ν i δ i = 0, ρ i (C i δ i = 0 α i y i = 0, δ i y i = 0 We denote F i = K( x i, x k y k α k, f i = K ( x i, xk y k(α k δ k i = 1,..., (6 k=1 k=1 and rewrite (5 in the form = y i F i γy i f i + 1 + φ 1 y i + λ i µ i = 0 α i = γy i f i + φ 2 y y + ν i ρ i = 0 δ i λ i 0, µ i 0, ν i 0, ρ i 0, λ i α i = 0, µ i (C i α i = 0, ν i δ i = 0, ρ i (C i δ i = 0 (7 α i y i = 0, δ i y i = 0

4 The first equation in (7 implies φ 1 = y j (1 y j F j γy j f j + λ j µ j (8 for all j. If j is selected such that 0 < α j < κc j and 0 < δ j < C j, then (7 implies λ j = µ j = ν j = ρ j = 0 and (8 has the form φ 1 = y j (1 y j F j γy j f j = y j (1 Therefore, B is computed as B = φ 1 : B = y j (1 y i y j K( x i, x j (α i γ where j is such that 0 < α j < κc j and 0 < δ j < C j. y i y j K( x i, x j (α i γ y i y j K ( x i, x j (α i δ i y i y j K ( x i, x j (α i δ i (9 (10 III. MATAB QUADRATIC PROGRAMMING FOR SVM+ AGORITHM WITH TWO SPACES Matlab function quadprog(h, f, A, b, Aeq, beq, lb, ub solves the quadratic programming (QP problem in the form as follows: 1 2 zt H z + f T z min A z b Aeq z = beq lb z ub Here, H,A,Aeq are matrices, and f, b, beq, lb, ub are vectors. (11 where z = (α 1,...,α,δ 1,...,δ R 2. We now rewrite (2 in the form (11. The first line of (11 corresponds to the first line of (2 when written as where 1 2 zt H z + f T z min f = ( 1 1, 1 2,..., 1,0 +1,...,0 2 and H i j = H11 H 12 H 12 H 22

5 where, for each pair i, j = 1,...,, H 11 i j = K( x i, x j y i y j + γk ( x i, x j y iy j, H 12 i j = γk ( x i, x j y iy j, H 22 i j = +γk ( x i, x j y iy j. The second line of (11 is absent. The third line of (11 corresponds to the second line of (2 when written as Aeq z = beq where Aeq = y 1 y 2 y 0 0 0 0 0 0 y 1 y 2 y, beq = 0 0 The fourth line of (11 corresponds to the third line of (2 when written as lb z ub where lb = (0 1,0 2,...,0,0 +1,...,0 2, ub = (κc 1,κC 2,...,κC,C 1,C 2,...,C After all variables of quadprog(h, f,a, b,aeq, beq, lb, ub are defined as above, optimization toolbox guide [9] can be used to select quadprog( function options such as an optimization algorithm. Then, output of the function can be used in decision rule f for a new sample z to make predictions as presented in (3 as follows: f ( z = sign ( y i α i K( x i, z + B (12 IV. CONCUSION In this report, we give the proof of SVM+ algorithm and describe the implementation in MATAB quadprog( function. Given standard training data and privileged information along with the selected kernel and parameters, the function returns the value of the objective function that can be used to make predictions on unseen samples by using standard training data.

6 REFERENCES [1] Vladimir Vapnik and Akshay Vashist. A new learning paradigm: earning using privileged information. Neural Networks, 2009. [2] Dmitry Pechyony, Rauf Izmailov, Akshay Vashist, and Vladimir Vapnik. Smo-style algorithms for learning using privileged information. In Proc. International Conference on Data Mining (DMIN, 2010. [3] Vladimir Vapnik and Rauf Izmailov. earning with intelligent teacher: similarity control and knowledge transfer. In Statistical earning and Data Sciences. Springer, 2015. [4] Heng Yang and Ioannis Patras. Privileged information-based conditional regression forest for facial feature detection. In Proc. International Conference on Automatic Face and Gesture Recognition. IEEE, 2013. [5] V. Sharmanska, N. Quadrianto, and C. H. ampert. earning to rank using privileged information. In Proc. International Conference on Computer Vision (ICCV. IEEE, 2013. [6] Daniel Hernández-obato, Viktoriia Sharmanska, Kristian Kersting, Christoph H ampert, and Novi Quadrianto. Mind the nuisance: Gaussian process classification using privileged noise. In Advances in Neural Information Processing Systems, pages 837 845, 2014. [7] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arxiv preprint arxiv:1503.02531, 2015. [8] David opez-paz, éon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. Unifying distillation and privileged information. arxiv preprint arxiv:1511.03643, 2015. [9] MATAB documentation of quadprog( function. http://www.mathworks.com/help/optim/ug/quadprog.html. [Online; accessed 19-December-2015].