A Framework for Modeling Positive Class Expansion with Single Snapshot

Size: px

Start display at page:

Download "A Framework for Modeling Positive Class Expansion with Single Snapshot"

Charla Elliott
5 years ago
Views:

1 A Framework for Modeling Positive Class Expansion with Single Snapshot Yang Yu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing University, China

2 Motivating task 1G evolution of mobile telecom network 2G 3G

3 Motivating task 1G evolution of mobile telecom network 2G we are at the moment of moving towards 3G 3G

4 Motivating task 1G evolution of mobile telecom network 2G we are at the moment of moving towards 3G 3G predict the 2G users that will turn to use 3G

5 Analysis of the task event: 2G starts 2G dominates 3G starts 3G dominates time line:

6 Analysis of the task event: 2G starts 2G dominates 3G starts 3G dominates time line: class distribution:

7 Analysis of the task event: 2G starts 2G dominates 3G starts 3G dominates time line: class distribution: when we train the model what we want to predict

8 Analysis of the task event: 2G starts 2G dominates 3G starts 3G dominates time line: class distribution: when we train the model what we want to predict positive class expansion with single snapshot (PCES) problem

9 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

10 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

11 Formulation of classical learning i.i.d. instances training set D { x } n i i fixed labeling function a learning algorithm outputs a function 1 drawn from a distribution py ( x) f ˆ(; D, p( y x)) to minimize: errˆ L(ˆ( f x; D, p( y x)), py ( x) ) f ( ; D, p( y x)) x~ can not model a changing labeling function

12 Formulation of PCES labeling function labeling function p ( y x ) tr p ( y x ) te at training time at testing time a learning algorithm outputs a function fˆ(; D, ptr ( y x )) to minimize: errˆ L ˆ(, ) ( ;, ( )) ~ ( f x ; Dp f D p, t ( y y r x )) pt e ( y x x x ) with a constraint: x ~ : p ( y y x) p ( y y x) te t tr t for convenience, we assume: y { 1, 1}, x ~ : p ( y 1 x) p ( y 1 x) te tr

13 Another example positive class: hot items negative class: not hot items

14 Another example positive class: hot items negative class: not hot items the positive class is expanding, only one snapshot the PCES problem

15 Further example positive class: hot items negative class: not hot items

16 Further example positive class: hot items negative class: not hot items the positive class is expanding, only one snapshot the PCES problem

17 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

18 Related learning frameworks PU-Learning (learning with positive and unlabeled data) Concept drift Covariance shift

19 PU-Learning Setting: only positive instances and unlabeled instances are in the training data Assumption: the positive instances are representatives of the positive class concept [Liu et al, ICML02][Yu et al, KDD02] PCES: positive class is in expansion PU-Learning could not catch expanded class concept

Concept Drift Setting: instances are coming sequentially batch by batch, the target concept may change in the coming batch Assumption: a series of data samples are

20 Concept Drift Setting: instances are coming sequentially batch by batch, the target concept may change in the coming batch Assumption: a series of data samples are available for drift detection [Klinkenberg & Joachims, ICDM00][Kolter & Maloof, ICML03] PCES: only a single snapshot is available concept drift approaches are disabled

21 Covariance Shift (or sample selection bias [Shimodaira, JSPI00]) Setting: training and test instances are drawn from different distributions, i.e., Assumption: the labeling function p( x) is in changing py ( x) is fixed p( x ) py ( x) PCES: is fixed but is in change covariance shift approaches are disabled

22 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

23 The proposed approach Learn from pure data Incorporate preference bias Combined objective Optimized by SGBDota

24 Learn from pure data Observation: a desired leaner ranks positive training instances higher than negative training instances exactly expressed by the AUC (area under ROC) criterion: 1 L () f 1 I( f( x ) f( x )) auc D D x D x D

25 Learn from pure data smoothed loss function: L a uc ( f ) 1 D 1 D x D x D (1 e ) ( f ( x ) f ( x )) 1 instance-wise loss function: L auc ( f, x) D 1 D x x D D (1 e ) ( f( x) f( x )) 1 (1 e ) ( f( x ) f( x)) 1 x x D D

26 Incorporate preference bias User can provide preferences by indicating preferences on randomly sampled instance pairs applying a priori rules that indicate the preferences In either way, we can have a preference function 1, x is prefered k( x, x ) 1, x is prefered a b a b 0, equal or unknown Loss function L pref ( f ) 1 1 D 2 x a D x b D I ( f ( x ) f ( x )) k( x, x ) a b a b

27 Incorporate preference bias smoothed loss function 1 L ( 1 1 p ref f e 2 D x D x ( f ( x ) f ( x )) k( x, x ) ) a b a b a b D 1 instance-wise loss function L pref ( f, x) 1 1 D x a D 1 e ( f( x) f( x )) k( x, x ) a a 1

28 Combine the two objectives the combined loss function L ( f ) L ( f ) L ( f ) auc pref the learning problem thus is fˆ arg minl ( f ) arg min L ( f ) L ( f) f f auc pref

29 Optimization Gradient Boosting [Friedman, AnnStat01, CSDA02] * f L f x y f x x arg min ( ( ), ) T F( x) h( x; ) t 0 t t ( t t L F (, ) t 1, ) arg min ( h(; )) t arg min h( x; ) Lf ( ( x)) f ( x) x D f( x) F ( x) t 1 2 arg min LF ( h(; )) t t 1 t

30 Optimization Gradient Boosting [Friedman, AnnStat01, CSDA02] * f L f x y f x x arg min ( ( ), ) T F( x) h( x; ) t 0 t t t arg min Gradient Boosting fits y, but we need to ( t t L F (, ) t 1 h( x; ), ) arg min ( h(; )) Lf ( ( x)) f ( x) x D f( x) F ( x) fit both y and k t 1 2 arg min LF ( h(; )) t t 1 t

31 Optimization with double targets SGBDota (Stochastic Gradient Boosting with DOuble TArgets) T * f arg m in L ( f) L ( f ) F x) h ( x; ) h ( x f auc pref t,1 1 t,1 t,2 2 t, 2 t 0 ( ( ; )) (, g min (; ) ( )),, ) ar L( F h h ; t,1 t,1 t,2 t, 2 (,,, ) t t,1 t,2 arg min h( x; ) x D arg min h( x; ) x D L L auc ( f( x)) f ( x) pref ( f( x)) f ( x) f( x) F ( x) t 1 f( x) F ( x) t (, ) t,1 t,2 arg min LF ( (, ) 1 2 h (; ) h (; )) t t,1 2 2 t,2

32 SGBDota Learn from pure data Incorporate preference bias Combined objective Optimize by SGBDota

33 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

34 Data Sets A synthetic data set + 4 UCI data sets postoperative segment veteran pbc Evaluation method 2/3 as the training data, 1/3 as test data repeated for 20 times random splits

35 Data Sets con t Dataset name: postoperative description: patient state after operation original classes: ICU, general hospital floor, prepare to go home Positive class for training ICU Positive class for testing ICU + general hospital floor some patients in general hospital floor will be sent to ICU

path, foliage, and grass Positive class for training grass

36 Data Sets con t Dataset name: segment description: outdoor images original classes: brickface, sky, cement, window, path, foliage, and grass Positive class for training grass Positive class for testing grass + foliage + path moving focus

37 Data Sets con t Dataset name: veteran description: lung cancer trial data original class: survival time Positive class for training survival time < 12 hours Positive class for testing survival time < 24 hours predict future victims

38 Data Sets con t Dataset name: pbc description: primary biliary cirrhosis trial data original class: living time Positive class for training living time < 365 days Positive class for testing living time < 1460 days predict future victims

39 Comparing Methods The only one approach for PCES GetEnsemble A classical learning approach Random Forests A PU-Learning approach PU-SVM A degenerate version: which does not use domain knowledge SGBAUC An easy approach Random guess

SGBDota Configuration for UCI datasets, we try three preferences the first two are reasonable for most tasks SGBDota-1: positive class expands from dense positive area to sparse

40 SGBDota Configuration for UCI datasets, we try three preferences the first two are reasonable for most tasks SGBDota-1: positive class expands from dense positive area to sparse positive area SGBDota-2: positive class expands from dense positive area to sparse positive area and sparse negative area SGBDota-3: positive class expands along with the neighborhoods linearly

41 Result on Synthetic Data

42 Result on Synthetic Data

43 Result on Synthetic Data Random Forests PU-SVM SGBAUC SGBDota-1

44 Result on Synthetic Data Random Forests PU-SVM SGBAUC SGBDota-1

45 Result on Synthetic Data Random Forests PU-SVM SGBAUC SGBDota-1

46 Results on UCI data sets using the first two preferences AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random Dataset SGBDota-1 SGBDota-2 GetEnsemble SGBAUC PU-SVM RF Random posto.470± ± ± ± ± ± ±.148 segment.821± ± ± ± ± ± ±.018 veteran.658± ± ± ± ± ± ±.069 pbc.721± ± ± ± ± ± ±.043 t-test results (win/tie/loss counts) GetEnsemble SGBAUC PU-SVM RF Random SGBDota-1 2/2/0 2/2/0 1/3/0 1/3/0 3/1/0 SGBDota-2 2/2/0 2/2/0 2/2/0 2/2/0 3/1/0 SGBDota with reasonable preference is better

47 Results on UCI data sets How about using a less reasonable preference? AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random Dataset SGBDota-3 GetEnsemble SGBAUC PU-SVM RF Random posto.459± ± ± ± ± ±.148 segment.744± ± ± ± ± ±.018 veteran.544± ± ± ± ± ±.069 pbc.638± ± ± ± ± ±.043 t-test results (win/tie/loss counts) GetEnsemble SGBAUC PU-SVM RF Random SGBDota-3 0/2/2 0/2/2 0/2/2 0/2/2 2/2/0 The preference must not be misleading

48 Outline A new data mining problem: PCES Why we need the PCES problem A solution to the PCES problem Results Conclusion

49 Conclusions Main contribution A new data mining problem: PCES exists in many real world applications not well handled by current techniques An initial solution Feature work better solutions real applications

50 THANK YOU

Online Manifold Regularization: A New Learning Setting and Empirical Study

Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu