The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

Similar documents
Image Classification Using EM And JE algorithms

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

MARKOV CHAIN AND HIDDEN MARKOV MODEL

Associative Memories

Neural network-based athletics performance prediction optimization model applied research

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

A finite difference method for heat equation in the unbounded domain

Kernel Methods and SVMs Extension

Nested case-control and case-cohort studies

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Cyclic Codes BCH Codes

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Lecture 10 Support Vector Machines II

Active Learning with Support Vector Machines for Tornado Prediction

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Generalized Linear Methods

Approximate merging of a pair of BeÂzier curves

Application of support vector machine in health monitoring of plate structures

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Sparse Training Procedure for Kernel Neuron *

we have E Y x t ( ( xl)) 1 ( xl), e a in I( Λ ) are as follows:

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

Which Separator? Spring 1

QUARTERLY OF APPLIED MATHEMATICS

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

Natural Language Processing and Information Retrieval

Lecture Notes on Linear Regression

3. Stress-strain relationships of a composite layer

Ensemble Methods: Boosting

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

A parametric Linear Programming Model Describing Bandwidth Sharing Policies for ABR Traffic

On the Power Function of the Likelihood Ratio Test for MANOVA

More metrics on cartesian products

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

10-701/ Machine Learning, Fall 2005 Homework 3

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by.

COXREG. Estimation (1)

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

NUMERICAL DIFFERENTIATION

Linear Classification, SVMs and Nearest Neighbors

Support Vector Machines for Classification and Regression

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages

Predicting Model of Traffic Volume Based on Grey-Markov

Maximal Margin Classifier

The Order Relation and Trace Inequalities for. Hermitian Operators

The line method combined with spectral chebyshev for space-time fractional diffusion equation

Polite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Optimum Selection Combining for M-QAM on Fading Channels

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

A Derivative-Free Algorithm for Bound Constrained Optimization

Distributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang

Quantitative Evaluation Method of Each Generation Margin for Power System Planning

Homework Assignment 3 Due in class, Thursday October 15

MMA and GCMMA two methods for nonlinear optimization

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM

Development of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong

On the Multicriteria Integer Network Flow Problem

AGC Introduction

Neural Networks. Incremental learning for ν-support Vector Regression

Feature Selection: Part 1

ECE559VV Project Report

Supporting Information

Extending boosting for large scale spoken language understanding

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

Learning with Tensor Representation

Lecture 20: November 7

EEE 241: Linear Systems

Linear Approximation with Regularization and Moving Least Squares

Negative Binomial Regression

APPENDIX A Some Linear Algebra

CSC 411 / CSC D11 / CSC C11

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Numerical Investigation of Power Tunability in Two-Section QD Superluminescent Diodes

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Difference Equations

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Chapter 6. Rotations and Tensors

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Report on Image warping

Linear Feature Engineering 11

Delay tomography for large scale networks

Linear Regression Analysis: Terminology and Notation

Part II. Support Vector Machines

Chapter 9: Statistical Inference and the Relationship between Two Variables

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Transcription:

Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng, Chna, 100084 {ym, tangje, jz}@keg.cs.tsnghua.edu.cn Support Vector Machnes (SVMs) s amed at fndng an optma separatng hyper-pane that maxmay separates the two casses of tranng exampes (more precsey, maxmzes the margn between the two casses of exampes). The hyper-pane, correspondng to a cassfer, s obtaned from the souton of a probem of quadratc programmng that depends on a cost parameter. The choce of the cost parameter can be crtca. However, n conventona mpementatons of SVMs t s usuay supped by the user or set as a defaut vaue. In ths paper, we study how the cost parameter determnes the hyper-pane. We especay focus on the case of cassfcaton usng ony postve and unabeed data. We propose an agorthm that can ft the entre souton path by choosng the best cost parameter whe tranng SVM modes. We compare the performance of the proposed agorthm wth the conventona mpementatons that use defaut vaues as the cost parameter on two synthetc data sets and two rea-word data sets. Expermenta resuts show that the proposed agorthm can acheve better resuts when deang wth postve and unabeed cassfcaton. Keywords: Support Vector Machne; Cost Parameter; Postve and Unabeed Cassfcaton. 1 Receved: 2008-3-16 Supported by the Natona Natura Scence Foundaton of Chna (No. 90604025, No. 60703059), Chnese Young Facuty Research Fundng (20070003093) Correspondng author: Yao Lmn. Te: 62788788-20; Ema: ym@keg.cs.tsnghua.edu.cn

Introducton Support Vector Machnes (SVMs), a new generaton earnng system based on recent advances n statstca earnng theory, dever state-of-the-art performance n rea-word appcatons such as text categorzaton, hand-wrtten character recognton, mage cassfcaton, and bonformatcs [1] [2]. We reca the standard formuaton of SVM. Gven a set of tranng data {(x 1, y 1 ),, (x n, y n )}, n whch x denotes an exampe (feature vector) and y {+1, -1} denotes ts cassfcaton abe. In the near case, the formuaton of SVM s n 1 2 mn β β + C 2 ξ = 1 (1) st.. y( β + β x) 1 ξ T 0 where the ξ are non-negatve sack varabes that aow ponts to be on the wrong sde of ther soft margn (f(x) = ±1), as we as the decson boundary. C s the cost parameter that contros the trade-off between the argest margn and the owest number of errors. Intuton shows that n the separabe case, wth a suffcenty arge C, the souton acheves the maxma margn separator and n the non-separabe case, the souton acheves a argest margn wth mnmum errors (sum of the ξ ). The choce of the C can be crtca [3]. The characterstcs of the hyper-pane can vary argey usng dfferent vaues of C. Ths s especay true n the probem of postve and unabeed cassfcaton (shorty PU cassfcaton) that s amed at budng cassfers wth ony postve and unabeed exampes, but no negatve exampes [4]. PU cassfcaton probem naturay arse n many rea-word appcatons, for exampe, text cassfcaton, homepage fndng, fterng spam from peope s emas, mage segmentaton, and nformaton extracton. In such cases, the

user ony annotates part of the postve exampes whe et the others unabeed, and hope that a good cassfer can be earned. Unfortunatey, n conventona mpementatons of SVMs, the cost parameter C s usuay supped by the user or set as a defaut vaue. For exampe, SVM-ght [5] uses the average norm of tranng exampes as the defaut vaue. In practce, our emprca study shows that the quaty of the traned SVM mode may be very senstve to C n PU cassfcaton. Thus, t s very mportant to dscover an optma vaue for C gven a specfc task of PU cassfcaton. Ths s exacty the probem addressed n ths paper. Recenty, severa approaches have been proposed to dea wth PU cassfcaton usng SVMs. For exampe, Lu et a. propose Based SVM appyng SVM to PU cassfcaton [6]. In Based SVM, the authors propose to choose the best vaue for C by empoyng a cross vadaton method to verfy the performance of resutng SVM modes wth the varous vaues. Yu proposes an extenson of the standard SVM approach caed SVMC (Support Vector Mappng Convergence) for PU cassfcaton [7] [8]. SVMC bascay expoted the natura gap between postve and negatve exampes n the feature space, whch eventuay corresponds to mprove the generazaton performance. However, SVMC suffered from under-sampng of postve documents, eadng to overft at some ponts and generaze poor resuts. See aso Roc-SVM [4], S-EM [9], and PEBL [10]. Most of the approaches avod the probem of the choce of C or use the emprca method (e.g. cross vadaton), whch resuts n very hgh computatona cost. It s aso possbe to dscard the unabeed data and earn ony from the postve data. Ths was done n the one-cass SVM [11], whch tres to earn the support of the postve dstrbuton. However, ts performance s mted because t cannot take advantage of the unabeed data.

The other type of reated work s unbaance cassfcaton, where the task s to dea wth very unbaanced numbers of postve and negatve exampes. Methods, for exampe two cost parameters are ntroduced for SVM to adjust the cost of fase postves vs. fase negatves [12], have been proposed. [13] aso proposes a varant of the SVM the SVM wth uneven margns, taored for text cassfcaton wth the unbaanced probem. The unbaance cassfcaton s n nature dfferent from the PU cassfcaton, as n the former probem abes of both postve and negatve exampes are annotated whe n the ater probem the unabeed exampes ncude part of postve exampes and a negatve exampes. Pont and Verr have studed propertes of Support Vector Machnes [14]. They have nvestgated the dependence of hyper-pane on the changes of the cost parameter. In [3] the authors argue that the choce of the SVM cost parameter can be crtca. They propose an agorthm, caed SvmPath, whch can ft the entre path of SVM soutons for a possbe vaues of C. The agorthm s based on the propertes of so caed pecewse-near. [15] [16] aso nvestgate the ssue of the souton path for support vector regresson to the cost parameter. [15] derves an agorthm to compute the entre souton path of the support vector regresson. [16] proposes an agorthm for exporng the two-dmensona souton space defned by the reguarzaton and cost parameters. See aso [17] [18]. Our work s nspred by the work of [3]. The dfference s that we focus on the probem of PU cassfcaton whe [3] focuses on the cassca cassfcaton. Ths paper addresses the ssue of fttng the entre souton path for SVMs n PU cassfcaton. We propose an agorthm, caed PU-SvmPath, to do the choce of the cost parameter automatcay whe tranng SVM modes for PU cassfcaton. We mpemented the agorthm

and conducted experments on synthetc data to evauate the effectveness of the proposed PU- SvmPath. We aso apped PU-SvmPath to Bo-medca data cassfcaton and text cassfcaton. Expermenta resuts show that the new agorthm s superor to the exstng agorthms n PU cassfcaton. Ths paper s organzed as foows. In Secton 2 we gve the probem settng. In Secton 3 we descrbe our approach and n Secton 4 we expan our agorthm n detas and n Secton 5, we present the expermenta resuts. We make concudng remarks n Secton 6. 1. Probem Settng In ths paper, we consder PU cassfcaton. Here we frst gve the defnton of the probem. Let {(x 1, y 1 ),, (x n, y n )} be a tranng data set, n whch x denotes an exampe (a feature vector) and y {+1, -1} denotes a cassfcaton abe. Assume that the frst k-1 exampes are postve (abeed +1), the rest are unabeed, whch we consdered as negatve (-1) (Note: they mght consst of unabeed postve exampes and rea negatve exampes.). We denote by I + the set of ndces correspondng to y =+1 ponts (postve exampes), there beng n + = I + n tota. Lkewse for I - and n -, wth I = I+ I. Our goa s to estmate a decson functon f(x) = β T x + β 0 (aso caed cassfer). The noseess case (no errors for postve exampes but ony for unabeed exampes) resuts n the foowng SVM formuaton: 1 2 n mn β β + C ξ 2 = k T s.. t y ( β + β x ) 1 I 0 y ( β + β x ) 1 ξ I T 0 ξ 0 I + (2)

Here, ony unabeed exampes have the sack varabes ξ ndcatng that errors ony are aowed for unabeed exampes. To dstngush ths formuaton from the cassca SVM, we ca t as PU-SVM. The objectve functon n the above formuaton can be wrtten n an aternatvey equvaent form wth the same constrants: n λ T mn β ξ + β β (3) = k 2 where the parameter λ corresponds to 1/C n (2). The formuaton s aso caed as Loss+Penaty crteron [3]. In ths paper, we w use the ater formuaton n expanng our agorthm and thus our goa of choosng the cost parameter C s cast as choosng the parameter λ (we ca t as reguarzaton parameter). For (3), we can construct the Lagrange prma functon: and set the dervatves to zero. We have: λ ξ β β α αξ γξ (4) 2 n n n n T (1 ( )) k + + yf x = = 1 = k = k : 1 n β = αyx, I λ β = 1 n : α 0, β = 1 0 y = I : ξ 1 α γ = 0, I aong wth the KKT condtons: (5) (6) (7) α (1 yf ( x)) = 0, I+ (8) α(1 yf ( x) ξ) = 0, I (9) γ ξ = 0, I (10) Substtutng the formua (5) - (7) nto (4), we obtan the Lagrange dua form: n n n 1 maxα α αα x xyy j j j = 1 2λ = 1 j= 1 st.. α 0, I 1 α 0, I n = 1 α y = 0 + (11)

We see that for postve exampes we have constrants α 0 ( I + ), for unabeed exampes we have constrants 1 α 0 ( I - ). When y f(x ) = 1, the pont x s on the margn (caed Support Vector). For the postve support vectors, we have α > 0, whe for the unabeed support vectors, we have 1 > α > 0. When y f(x ) > 1, the pont x s outsde the margn. In ths case, both postve and unabeed exampes have α =0. When y f(x ) < 1, the pont x s nsde the margn (snce we are deang wth the noseess case, ony unabeed exampes can e nsde the margn). When ths occurs, unabeed exampes have ther α =1. Our goa now s to sove the equaton (11) so as to fnd the entre souton path for a possbe λ 0. 2. Our Approach: The Entre Generazaton Path for SVM We propose an agorthm for sovng the equaton (11). Our basc dea s as foows. Same as that n [3], we start wth λ arge and decrease t toward zero, keepng track of a the events that occur aong the way. Dfferent from [3], as the nta vaue of λ, we propose to carefuy seect a arge one, wth whch we can construct an nta hyper-pane wth a postve exampes correcty cassfed. Then as λ decreases, β ncreases (See formua (5)), and hence the wdth of the margn decreases. As ths wdth decreases, ponts move from beng nsde to outsde the margn (n our probem, ony unabeed exampes nsde the margn at the begnnng). Ther correspondng α change from α =1 when they are nsde the margn (y f(x ) < 1) to α =0 when they are outsde the margn (y f(x ) > 1). By contnuty, those ponts must nger on the margn (y f(x ) = 1) whe ther α decrease from 1 to 0. In ths process we aways keep a the postve exampes correcty cassfed. Postve ponts can ony eave the margn to outsde the margn,

wth ther correspondng α >0 changng to α =0 or jon the margn from the outsde wth α =0 changng to α >0. In our approach, each pont x beongs to one of the foowng three sets. (For convenence, we denote α of postve exampes as α p ={α I + } and kewse, α n ={α I - }.) M={: y f(x ) = 1, α p >0, 1>α n >0}, M= M p M n for Margn, where M p denotes postve ponts on the margn (f(x)=1) and M n denotes unabeed ones on the margn (f(x)= -1). L={: y f(x ) < 1, α n =1}, L for nsde the margn, ony unabeed ponts. R={: y f(x ) > 1, α =0}, R for outsde the margn. Ponts n varous sets may have varous events (one event can be vewed as an acton of eavng one set to enter nto another set). By trackng a the events n teratons, we can sove the entre generazaton path and seect the best souton. One of the key ssues here s how to defne the events for effcenty earnng. We w gve the detaed defntons of the events n Secton 4.3. 3. Agorthm: PU-SvmPath We have mpemented our agorthm PU-SvmPath by extendng SvmPath [3], whch s used for fndng entre generazaton path for conventona Support Vector Machne. In ths paper, we ony consder the noseess case of postve and unabeed cassfcaton for factatng the descrpton. However, the proposed agorthm can be extended to the nosy case aso (aowng nose n the abeed postve exampes). The reated ssues are what we are currenty researchng, and w be reported esewhere. 3.1. Outne

The nput of our agorthm s a tranng set whch conssts of postve and unabeed exampes. The objectve s to fnd a PU-SVM mode (ncudng β and λ n (3) or α and λ n Lagrange form mpct n (11)). Agorthm: PU-SvmPath Step 1. Intazaton: determne the nta vaues of α and λ, and thus β and β 0 ; 1. α n =1, I ; 2. Fndng the vaues of α p by sovng; (see Secton 4.2) 2 * mn α β ( α ) 3. Cacuatng λ 0, β and β 0 usng the nta vaues of α; Step 2. Fndng entre generazaton path; (see Secton 4.3) 4. do{ 5. Budng a possbe events based on the current vaues of the parameters; 6. For each event, cacuatng the new λ by supposng t occurs; 7. Seectng the argest λ<λ from a the events as λ +1 ; 8. Updatng α wth λ +1 by the fact that α are pecewse-near n λ +1 ; 9. Updatng the sets M, L, and R; 10. }whe (termna condtons are not satsfed). Fgure 1. The Agorthm of PU-SvmPath In our agorthm: PU-SvmPath, we propose sovng the entre reguarzaton path n the foowng steps: (1) Intazaton. It determnes the nta vaues for β, α and λ. Wth the nta vaues, we can estabsh the nta state of the three pont sets defned above. When determnng the nta vaues, we need consder the fact that n - s much arger than n + and the constrant that a postve exampes shoud be correcty cassfed. Moreover, we need to satsfy the constrants (6). We then empoy a quadratc programmng agorthm to obtan the nta confguraton. (2) Fndng entre generazaton path. The agorthm runs teratvey. In each teraton, we try to fnd a new λ +1, based on the current vaue λ ( denotes the -th teraton). It searches n the possbe event space for an event whch has the argest λ<λ. Then t estabshes the λ +1 and

hence updates α wth λ +1 accordng to the fact that α are pecewse-near n the reguarzaton parameter λ +1. (3) Termnaton. The agorthm runs unt some termna condtons are satsfed. Fgure 1 summarzes the proposed agorthm. In the rest of the secton, we w expan the three steps n detas. 3.2. Intazaton In ntazaton, the task s to fnd the nta vaues for β 0, α, and λ. We denote the nta λ as λ 0. In order to fnd an nta vaue for λ, we frst consder a unabeed exampe nsde the margn (note: the arger the λ, the arger the margn). Thus for a unabeed exampe ( I - ), a the ξ > 0, γ = 0, and hence α = 1. In ths case, when β=0, the optmum choce for β 0 s 1, and the oss s n = 1 ξ = 2n. However, we are requred that (6) hods. We formaze the ntazaton probem as that of optmzaton. Here we use Lagrange prma form (11) of the PU-SVM as the objectve functon. At the start pont, a unabeed exampes have ther α = 1, aong wth the KKT n condton α n 0 1 y = =, the frst part of the objectve functon resuts n 1 constant. Let β * =βλ. From (5), we have = α = 2n, whch s a n * β ( α) αyx = 1 = (12) We can then obtan a new objectve functon from (11) wth constrants as foows: * 2 mn ( ) α β α s.. t α [0,1], I α = 1, I I+ α = n + (13)

Sovng ths probem, we can get the vaues of α ( I + ). We now estabsh the startng pont λ 0 and β 0. We can frst cacuate the β * by (12). As mentoned above, a α p ( I + ) are ether 0 or α p >0. Suppose α p >0 (say on the margn). Let - =argmn I- β T x. Then the ponts + and - are on the postve margn (β T x + +β 0 =1) and the negatve margn (β T x - +β 0 = -1) respectvey. From the foowng equatons: We have * T β x + β λ + = 0 1 * T β x β0 1 λ + = (14) λ β x β x β x + β x * T * T * T * T + + 0 = β0 = * T * T 2 β x β x + (15) 3.3. Fndng λ +1 As mentoned above, the ponts fa nto three sets: M, L, and R. For each set of ponts, there are severa possbe events. We defne the events as foows: a. The nta event, whch means that two or more ponts enter the margn. The event happens at the begnnng of the agorthm (ntazaton) or when the set M s empty. b. A pont eaves from R to enter M, wth ts vaue of α ntay 0. c. A pont I - eaves from L to enter M, wth ts vaue of α ntay 1. d. One or more ponts n M eave to jon R. e. One or more ponts I - n M enter L. Whchever the case, for contnuty reason, the sets w stay stabe unt an event occurs. For a pont n R, ony one event can occur,.e. event b; two events can occur on a pont n M,.e. event e and event d; one event (event c) can occur on a pont n L (ony unabeed ponts). At the begnnng of the agorthm or when the set M s empty, the event a w occur.

When stayng n a stabe stuaton,.e. α changes neary wth λ, as mpct n equaton 21, we can magne the possbe events of a the ponts. For each event, we cacuate the new vaue λ and hence the new α. We seect the event that has the argest λ<λ and use ts λ as the λ +1. Then we can update α. The process contnues unt some termna condtons are satsfed. The key pont then s how to compute the new vaue of λ +1 for an event. Consderng a pont passng from R through M to L, ts α w change from 0 to 1, vce versa (ponts n I + are constraned to stay n M or R ony). When the pont x s on the margn, we have y f(x ) = 1. In ths way, we can estabsh a path for each α. We adopt the method proposed n [3] to compute the λ +1 for an event. We now expan the method n detas. We use the subscrpt to represent the -th event occurred. Suppose M =m, and et α, β 0, and λ be the vaues of these parameters at the pont of entry. Lkey f s the functon at ths pont. For convenence we defne α 0 =λβ 0, and hence α 0 =λβ 0. Then we have For λ > λ >λ +1, we can wrte 1 n f( x) = ( yjα j( x xj) + α0 ) (16) λ = 1 λ λ f() x = [ f() x f ()] x + f () x λ λ 1 [ ( ) ( ) ( 0 0 ) = αj αj yj x xj + α α + λf ( x )] λ j M (17) The second ne foows because a the unabeed ponts n L have ther α =1, and those n R have ther α =0, for ths range of λ. Snce each of the m ponts x M are to stay on the margn, we have y f(x ) = 1. Accordng to (17), we have: Wrtng δ j =α j - α j, from (18) we have 1 yf( x) = [ ( α α ) yy( x x) + y( α α ) + λ] = 1 (18) j j j j 0 0 λ j M δ yy( x x) + yδ = λ λ, M j j j 0 (19) j M

Furthermore, snce at a tmes we are requred that (6) hods, we have that δ jy j = 0 (20) j M Equaton (19) and (20) consttute m+1 near equatons wth m+1 unknown δ j. We can obtan δ j =(λ - λ)b j, j 0 M, where b j s a varabe obtaned by sovng the equatons. Hence: α j = α j ( λ λ) bj, j 0 M (21) The equaton (21) means that for λ > λ >λ +1, the α j w change neary n λ. We can aso wrte (17) as where λ f ( x) = [ f ( x) h ( x)] + h ( x) (22) λ h ( x) = y b ( x x ) + b Thus the functon changes n an nverse manner n λ for λ > λ >λ +1. j j j 0 (23) j M We now obtan an mportant property that α j change neary n λ between two events (caed pecewse-neary), whch enabes us easy estabsh λ for each event from (21) and (22): - when one of the pont x eave M to enter L or R, we have ts α =1 or α =0. Accordng to (21), we can compute ts λ by (1 α j ) (0 α j ) λ = + λ λ = + λ (24) b j b or j - when one of ponts from L or R to enter M, we have y f(x )=1. Substtutng t nto (22), we can obtan ts λ by λ λ = [ f ( x) h ( x)] y h ( x) In ths way, we obtan λ for each possbe event and thus are abe to seect the argest λ<λ as the λ +1. We then make use of the property that α j change pecewse-neary n λ to obtan a α. (25) 3.4. Termnaton

In postve and unabeed earnng, λ runs a the way down to zero. For ths to happen wthout f bowng up n (22), we must have f - h =0. So that the boundary and margns reman fxed at a pont where constrant. ξ s as sma as possbe, and the margn s as wde as possbe subject to ths 3.5. Kernes The proposed agorthm can be easy extended to the more genera kerne form. In the kerne case, we can repace the nner product (x x j ) n (16) by a kerne functon K 1 n f( x) = ( yjα jk( x xj)) + β0 (26) λ = 1 Ths genera kerne case makes our agorthm support non-near cassfcaton probem. 4. Experments 4.1. Datasets and Experment Setup 4.1.1. Datasets We evauated our proposed method on two synthetc data sets and two rea-word data sets. We frst constructed two synthetc data sets: PU_toy1 and PU_toy2. Both of them are twodmensona. PU_toy1 conssts of 100 postve exampes and 100 negatve exampes, whch were generated usng a functon of two-mutvarate norm dstrbutons wth mean [2,2] T and covarance [2,0; 0,2] for postve exampes and mean [-2, -2] T and covarance [2,0; 0,2] for negatve exampes. PU_toy2 conssts of 200 postve exampes and 200 negatve exampes, whch were aso generated usng a functon of two-mutvarate norm dstrbutons wth mean [2, -3] T and covarance [2,0; 0,2] for postve ponts and mean [-3, 2] T and covarance [2,0; 0,2] for negatve ponts.

We aso tred to carry out experments on rea-word data sets: a Bo-medca data set: Types of Dffuse Large B-ce Lymphoma (DLBCL) 2 and a Text Cassfcaton data set: 20newsgroups 3. The former data descrbes the dstnct types of dffuse arge B-ce ymphoma (DLBCL), the most common subtype of non-hodgkn s ymphoma, usng gene expresson data. There are 47 exampes, 24 of them are from germna centre B-ke group whe 23 are actvated B-ke group. Each exampe s descrbed by 4026 genes [19]. For the text cassfcaton data, we chose eght categores from 20newsgroups 2 to create a data set. The eght categores are: mswndows.msc, graphcs, pc.hardware, msc.forsae, hockey, chrstan, sc.crypt, rec.autos. Each category contans 100 documents. 4.1.2. Experment Setup In a experments, we conducted evauaton n terms of F1-measures, whch s defned as F1 = 2PR / (P + R). Here P and R respectvey represent precson and reca. We made comparson wth exstng methods: SvmPath [3], SVM-ght [5], Bas-SVM [6], and One-cass SVM [11]. SvmPath s proposed for fttng the entre reguarzaton path for the cassca SVM, not for PU cassfcaton. We made comparson wth SvmPath to ndcate the necessty of proposed method. SVM-ght s aso desgned for cassca SVM and t needs the user to provde vaues for the cost parameter. We use the defaut vaues as the parameters n SVM-ght. For SvmPath and SVM-ght, we vew the unabeed exampes as negatve exampes. Bas-SVM targets at PU cassfcaton. It takes nto consderaton of both abeed and unabeed exampes. It needs the user to provde two cost parameters respectvey for abeed postve exampes and unabeed exampes. In [6] the authors propose empoyng cross-vadaton to 2 http://sdmc.t.org.sg/gedatasets/datasets.htm#dlbcl

seect the best cost parameters. It obvousy resuts n hgh computatona cost. In our experments, we use a typca method by consderng the numbers of postve and unabeed exampes to determne the vaues for the cost parameters. One-cass SVM s approprate for onecass cassfcaton (t tres to earn a cassfer from the postve data). Tabe 1 ndcates the methods we used to set the vaues for the cost parameters n the experments. Tabe 1. The Methods for Settng the Cost Parameters Methods Cost Parameter SVM-ght C=1/avg^2 Bas-SVM C1=og(n-)/(avgp^2*og(n+)) C2=1/(avgn^2) In the tabe, C, C1, and C2 denote the cost parameters. avg, avgp, avgn respectvey represent the average norm of a exampes, postve exampes, and negatve exampes. n + and n - are the numbers of the postve and negatve exampes (Note that n PU cassfcaton, we take unabeed exampes as negatve). 4.2. Expermenta Resuts 4.2.1. Resuts on the Synthetc Dataset We conducted the experments as foows. We spt each of the data set nto two sub sets wth the same sze, one for tranng and one for test. In the tranng data set, γ percent of the postve ponts were randomy seected as abeed postve exampes and the rest of the postve exampes and negatve exampes were vewed as unabeed exampes. We ranged γ from 20% to 80% (0.2-0.8) to create a wde range of test cases. Tabe 2 shows expermenta resuts on the synthetc data sets. In the tabe, SvmPath, SVM-ght, Bas-SVM, and One-cass SVM respectvey represent the methods ntroduced above. PU-SvmPath denotes our method. PU_toy1-20% denotes that we 3 http://peope.csa.mt.edu/jrenne/20newsgroups/

use 20% of the postve exampes n PU_toy1 as abeed postve exampes and the others as unabeed exampes (ncudng unabeed postve exampes and negatve exampes). Lkewse for the others. We see from the tabe that the proposed PU-SvmPath can sgnfcanty outperform the other methods n most of the test cases, especay wth few abeed postve exampes (20% and 40%). When γ (the rato of abeed postve exampes) ncreases to 80%, a of the methods can obtan good resuts. Tabe 2. Average F1-scores on Synthetc Datasets (%) Test Case SvmPath PU-SvmPath SVM-ght Bas-SVM One-cass SVM PU_toy1-20% 35.48 94.23 0.00 3.85 41.27 PU_toy1-40% 82.35 95.24 0.00 60.27 61.11 PU_toy1-60% 85.06 96.15 88.89 95.83 48.48 PU_toy1-80% 96.91 92.59 96.97 96.97 48.48 PU_toy2-20% 95.00 98.52 0.00 0.00 49.62 PU_toy2-40% 99.00 98.52 0.00 0.00 62.07 PU_toy2-60% 92.47 97.56 95.29 95.83 63.95 PU_toy2-80% 99.50 97.09 99.50 99.50 63.01 The method of usng SVM-ght wth one defaut cost parameter can ony work we n the cases that have enough abeed postve exampes and cannot come up wth any resuts when there are ony few abeed postve exampes, for exampe, n the cases of PU_toy1-20%, PU_toy1-40%, PU_toy2-20%, and PU_toy2-40%. We aso note that when there are suffcent abeed data, say γ=0.8, the SVM-ght can acheve the best performance (96.97% on PU_toy1 and 99.50% on PU_toy2). The Bas-SVM can acheve better resuts than the SVM-ght. However, the performances of the resutng modes are senstve to the data sets. For exampe, Bas-SVM can obtan 60.27% (F1 score) on PU_toy1 wth γ as 0.4, but cannot yed any resuts on the other data set PU_toy2 wth

the same γ. The performances are aso senstve to the vaues of the cost parameters. Wth dfferent vaues, the performances mght vary argey. One-cass SVM can earn a cassfer wth ony postve exampes, however t cannot take advantage of the unabeed data. Its performance s poorer than PU-SvmPath. It s aso nferor to SvmPath, SVM-ght, and Bas-SVM. When γ s sma (0.2 and 0.3), t outperforms SVM-ght and Bas-SVM. SvmPath SvmPath SvmPath PU-SvmPath (51 steps) PU-SvmPath (61 steps) PU-SvmPath (135 steps) (a) PU_toy1-20% (b) PU_toy1-60% (c) PU_toy2-40% Fgure 2. Hyper-panes generated by SvmPath and PU-SvmPath SvmPath can ft the entre reguarzaton path so that t can fnd a best vaue for the cost parameter. However, SvmPath s proposed for cassca cassfcaton, not for PU cassfcaton. From tabe 2, we can see that n most of the test cases, the proposed PU-SvmPath sgnfcanty outperforms SvmPath. We made detaed anayss for comparng the two agorthms. Fgure 2

shows the hyper-panes earned by SvmPath and PU-SvmPath n three test cases: PU_toy1-20%, PU_toy1-60%, and PU_toy2-40%. In the fgures, *, o, and x ndcates abeed postve exampes, unabeed postve exampes, and unabeed negatve exampes respectvey. The upper three fgures are hyper-panes generated by SvmPath and the beow three fgures are those generated by PU-SvmPath. We see that n the three test cases, PU-SvmPath can construct more accurate hyper-panes (see Fgure 2(a)) and more reguar hyper-panes (see Fgure 2(b) and Fgure 2(c)) than SvmPath. The major probem of SvmPath n the PU cassfcaton s that t treats a the unabeed exampes as negatve ones, thus resuts n a very unbaance cassfcaton task (wth ony a few postve exampes whe a arge number of negatve exampes). Ths eads the hyper-anes constructed by SvmPath to move toward to the postve exampes (see Fgure 2 (a) and (b)). 4.2.2. Resuts on the Bomedca Dataset The SVM s popuar n stuatons where the number of features exceeds the number of exampes. The Bo-medca data set (DLBCL) s just n such case, where t has 4026 features whe ony 47 exampes. Here one typcay fts a near cassfer. We argue that the proposed PU-SvmPath can pay an mportant roe for these knds of data. In DLBCL, we have two categores wth each contanng haf of the exampes. The task s to cassfy an exampe nto one of the two categores (equay postve and negatve casses). Wth the two categores, we then have two test cases wth one category as postve and the other as negatve. For each test case, we spt the data set nto two sub sets wth the same sze, one for tranng and one for test. In the tranng data set, γ percent of the postve ponts were randomy

seected as abeed postve exampes and the rest of the postve exampes and negatve exampes were vewed as unabeed exampes. For γ, we ony seected as 50% and 75% (0.5 and 0.75), because the number of the exampes s mted. Tabe 3 shows expermenta resuts on Bomedca dataset. Gene data sze s sma. The features are much more than the data sze. It s a dffcut cassfcaton probem. Tabe 3. Average F1-scores on Bomedca Dataset (%) γ SvmPath PU-SvmPath SVM-Lght Bas-SVM One-cass SVM 0.50 43.48 47.06 0.00 15.38 0.00 0.75 41.67 73.68 28.57 58.82 0.00 The resuts show that PU-SvmPath can sgnfcanty outperform SvmPath (+3.58% when γ as 0.5 and +32.01% when γ as 0.75) as we as sgnfcanty outperform SVM-ght. SVM-ght cannot earn a good cassfer usng the defaut cost parameter. It confrms the necessty of the choce of the cost parameter. PU-SvmPath outperforms Bas-SVM as we. One-cass SVM cannot resut n any resuts on ths data. 4.2.3. Resuts on the Text Cassfcaton Dataset We ustrate our agorthm on another rea-word data: 20newsgroup. The data set conssts of eght categores wth each contanng 100 documents. Then the task s to cassfy a document nto one of the eght categores. We adopt the one cass versus a others approach,.e., take one cass as postve and the other casses as negatve. Then we have eght document cassfcaton tasks. For each document, we empoy tokenzaton, stop-words fterng, and stemmng. For each cassfcaton task, we spt the data set nto two sub sets wth the same sze, one for tranng and one for test. In the tranng data set, same as that n the above experments, γ

percent of the postve ponts were randomy seected as abeed postve exampes and the rest of the postve documents and negatve documents were vewed as unabeed exampes. We ranged γ from 10% to 90% (0.1-0.9) and create 10 test cases. Tabe 4 shows the expermenta resuts on 20newsgroup. The resuts show that n most of the test cases, PU-SvmPath can sgnfcanty outperform SvmPath and SVM-ght as we as One-cass SVM. However, we need note that PU-SvmPath s ony comparabe wth Bas-SVM when γ s sma (from 0.1 to 0.5) and s nferor to Bas-SVM when γ ncreases. Ths s because the 20newsgroups data has some nosy exampes, whch ndcates that the γ percent abeed postve exampe conssts of nosy exampes. So far PU-SvmPath can ony hande the noseess case. The resuts aso ndcate that usng the methods of two cost parameters can better descrbe the unbaance stuaton between the postve and negatve exampes n the tranng data. Tabe 4. Average F1-scores on 20newsgroup (%) γ SvmPath PU-SvmPath SVM-Lght Bas-SVM One-cass SVM 0.1 13.72 11.10 3.92 11.09 14.27 0.2 29.88 21.58 4.46 21.85 21.95 0.3 40.33 35.83 3.46 36.60 24.06 0.4 39.55 45.34 7.60 50.43 24.62 0.5 56.15 55.22 13.02 60.42 24.45 0.6 57.87 60.17 24.63 71.00 24.81 0.7 67.88 69.72 43.35 80.51 24.66 0.8 68.47 78.20 65.06 87.79 25.05 0.9 73.54 85.45 76.81 93.22 24.63 4.3. Dscusson Our work s nspred from Haste s pecewse souton path for SVMs. As [18] ponts out, many modes share pecewse-near reatonshps between coeffcent path and the cost parameter C or 1/λ. Postve and Unabeed data s a speca exampe of SVM mode. Ths bas mode aso has the pecewse-near property. However, the dfference es n that we shoud keep a postve exampes correct. The hyper-panes obtaned by dfferent agorthms show that the

constrant that postve exampes are outsde the margn does affect the fna SVM souton. Appyng the pecewse-near agorthm to rea data st needs more nvestgaton. We w detect more on the dfferent soutons obtaned by PU_SvmPath and the soutons of tradtona SVM mode on the text cassfcaton dataset. 5. Concuson In ths paper, we nvestgated the ssue of fttng the entre souton path for SVM n postve and unabeed cassfcaton. We proposed an agorthm whch can determne the cost parameter automatcay whe tranng the SVM mode. Expermenta resuts on synthetc data and reaword data show that our approach can outperform the exstng methods. As future work, we tred to extend the proposed agorthm to the nosy case, where the postve exampes can be nosy (e.g. mstakeny abeed). We ntend to use two parameters C + and C - to contro the errors of postve and negatve exampes respectvey. One of the key ponts s to nvestgate the propertes of the two parameters wth the Lagrange mutper α. 6. References [1] Cortes C and Vapnk V. 1995. Support-vector networks. Machne Learnng, Vo. 20, 1995, pp273-297. [2] Haste T, Tbshran R, and Fredman J. 2001. The eements of statstca earnng; data mnng, nference and predcton. Sprnger Verag, New York, 2001. [3] Haste T, Rosset S, Tbshran R and Zhu J. 2004. The entre reguarzaton path for the support vector machne. Journa of Machne Learnng Research, (5):1391-1415, 2004. [4] L X and Lu B. 2003. Learnng to cassfy text usng postve and unabeed data. In Proc. of IJCAI 2003 [5] Joachms T. 1999. Makng arge scae SVM earnng practca. In Book: Practca Advances n Kerne Methods - Support Vector Learnng, B. Schokopf, C.J.C. Burges, and A.J. Smoa, edtors. MIT Press. [6] Lu B, Da Y, L X, Lee W, and Yu P. 2003. Budng text cassfers usng postve and unabeed exampes. In Proc. of ICDM 03.

[7] Yu H. 2003. SVMC: Snge-cass cassfcaton wth support vector machnes, In Proc. of IJCAI 2003 [8] Yu H, Zha C, and Han J. 2003. Text cassfcaton from Postve and Unabeed Documents. In Proc. of CIKM 2003, pp232-239 [9] Lu B, Lee W, Yu P, and L X. 2002. Partay supervsed cassfcaton of text documents. In Proc. of ICML 2002, pp387-394. [10] Yu H, Han J., and Chang K C. 2002. PEBL: postve exampe based earnng for Web page cassfcaton usng SVM. In Proc. of ACM SIGKDD (KDD 2002), ACM Press, New York, 2002, pp239-248. [11] Manevtz L, and Yousef M. 2001. One-cass SVMs for document cassfcaton. Journa of Machne Learnng Research, 2. pp139-154 [12] Mork K, Brockhausen P, and Joachms T. 1999. Combnng statstca earnng wth a knowedge-based approach - A case study n ntensve care montorng. In Proc. of ICML 1999, pp.268-277 [13] L Y and Shawe-Tayor J. 2003. The SVM wth uneven margns and Chnese document categorzaton. In Proceedngs of PACLIC 2003, pp216-227. [14] Pont M and Verr A. 1998. Propertes of support vector machnes. Neura Comput., 10(4):955-974,1998 [15] Gunter L and Zhu J. 2005. Computng the souton path for the reguarzed support vector regresson. In Proc. of NIPS 05. [16] Wang G, Yeung D Y, Lochovsky F H. 2006. Two-dmensona souton path for support vector regresson. In Proc. of ICML 2006, Pttsburge, PA, U.S.A. pp993-1000 [17] Cauwenberghs G and Poggo T. 2001. Incrementa and decrementa support vector machne earnng. In Proc. of NIPS 2001. Cambrdge, MA, 2001. [18] Rosset S and Zhu J. 2003. Pecewse near reguarzed souton paths, Technca report, Stanford Unversty. [19] Azadeh A A, Esen M B, et a. Dstnct types of dffuse arge B-ce ymphoma dentfed by gene expresson profng. Nature 2000, 403(6769):503-11