Interval Data Classification under Partial Information: A Chance-constraint Approach

Size: px

Start display at page:

Download "Interval Data Classification under Partial Information: A Chance-constraint Approach"

Marshall Martin
6 years ago
Views:

1 Interval Data Classification under Partial Information: A Chance-constraint Approach Sahely Bhadra J. Saketha Nath Aharon Ben-Tal Chiranjib Bhattacharyya PAKDD 2009 J. Saketha Nath (PAKDD 09) Conference Seminar 1 / 15

2 Data Uncertainty Real-world data fraught with uncertainties, noise. Measurement errors, non-zero least counts etc. Inherent heterogenity: Bio-medical data e.g. Micro-array, cancer diagnostic data. Computational/Respresentational convinience. J. Saketha Nath (PAKDD 09) Conference Seminar 2 / 15

3 Data Uncertainty Real-world data fraught with uncertainties, noise. Measurement errors, non-zero least counts etc. Inherent heterogenity: Bio-medical data e.g. Micro-array, cancer diagnostic data. Computational/Respresentational convinience. Many datasets provide partial information regarding noise. e.g., Wisconsin breast cancer datasets (support, mean, std. err.) Micro-array datasets (replicates) J. Saketha Nath (PAKDD 09) Conference Seminar 2 / 15

4 Data Uncertainty Real-world data fraught with uncertainties, noise. Measurement errors, non-zero least counts etc. Inherent heterogenity: Bio-medical data e.g. Micro-array, cancer diagnostic data. Computational/Respresentational convinience. Many datasets provide partial information regarding noise. e.g., Wisconsin breast cancer datasets (support, mean, std. err.) Micro-array datasets (replicates) Classifiers accounting for uncertainty generalize better. J. Saketha Nath (PAKDD 09) Conference Seminar 2 / 15

5 Problem Definition Problem: Assume partial information regarding uncertainties given: bounding intervals (i.e. support) and means of uncertain eg. Make no distributional assumptions. Construct classifier that generalizes well. J. Saketha Nath (PAKDD 09) Conference Seminar 3 / 15

6 Existing Methodology [Laurent El Ghaoui et.al., 2003]: Utilize support alone; neglect statistical information True datapoint lies somewhere in bounding hyper-rectangle Construct regular SVM J. Saketha Nath (PAKDD 09) Conference Seminar 4 / 15

7 Existing Methodology [Laurent El Ghaoui et.al., 2003]: Utilize support alone; neglect statistical information True datapoint lies somewhere in bounding hyper-rectangle Construct regular SVM SVM Formulation: 1 min w,b,ξ 2 w C i ξ i i s.t. y i (w x i b) 1 ξ i, ξ i 0 J. Saketha Nath (PAKDD 09) Conference Seminar 4 / 15

8 Existing Methodology [Laurent El Ghaoui et.al., 2003]: Utilize support alone; neglect statistical information True datapoint lies somewhere in bounding hyper-rectangle Construct regular SVM IC-BH Formulation: 1 min w,b,ξ 2 w C i ξ i i s.t. y i (w x i b) 1 ξ i, ξ i 0, x i R i J. Saketha Nath (PAKDD 09) Conference Seminar 4 / 15

9 Limitations of Existing Methodologies Neglect useful statistical information regarding uncertainty Overly-conservative uncertainty modeling leads to less margin Poor generalization J. Saketha Nath (PAKDD 09) Conference Seminar 5 / 15

10 Limitations of Existing Methodologies Neglect useful statistical information regarding uncertainty Overly-conservative uncertainty modeling leads to less margin Poor generalization Proposed Methodology: Use both support and statistical information Employ Chance-Constraint Program (CCP) approaches Relax CCP using Bernstein bounding schemes Not overly-conservative better margin and generalization Leads to convex Second Order Cone Program (SOCP) J. Saketha Nath (PAKDD 09) Conference Seminar 5 / 15

11 Proposed Formulation SVM: 1 min w,b,ξ 2 w C i ξ i i s.t. y i (w x i b) 1 ξ i, ξ i 0 J. Saketha Nath (PAKDD 09) Conference Seminar 6 / 15

12 Proposed Formulation SVM: 1 min w,b,ξ 2 w C i ξ i i s.t. y i (w X i b) 1 ξ i, ξ i 0 J. Saketha Nath (PAKDD 09) Conference Seminar 6 / 15

13 Proposed Formulation Chance-Constrained Program: 1 min w,b,ξ 2 w C i ξ i i s.t. Prob { y i (w X i b) 1 ξ i } ǫ, ξi 0 J. Saketha Nath (PAKDD 09) Conference Seminar 6 / 15

14 Proposed Formulation Chance-Constrained Program: 1 min w,b,ξ 2 w C i ξ i i s.t. Prob { y i (w X i b) 1 ξ i } ǫ, ξi 0 Assumptions: X i R i. E[X i ] are known. X ij, j = 1,...,n are independent random variables. J. Saketha Nath (PAKDD 09) Conference Seminar 6 / 15

15 Convex Relaxation Comments: In general, difficult to solve such CCPs. Construct an efficient relaxation: Employ Bernstein schemes to upper bound probability Constrain the upper-bound to be less than ǫ J. Saketha Nath (PAKDD 09) Conference Seminar 7 / 15

16 Convex Relaxation Comments: In general, difficult to solve such CCPs. Construct an efficient relaxation: Employ Bernstein schemes to upper bound probability Constrain the upper-bound to be less than ǫ Key Question: } Prob {y i (w X i b) 1 ξ i? J. Saketha Nath (PAKDD 09) Conference Seminar 7 / 15

17 Convex Relaxation Comments: In general, difficult to solve such CCPs. Construct an efficient relaxation: Employ Bernstein schemes to upper bound probability Constrain the upper-bound to be less than ǫ Key Question: } Prob {y i (w X i b) 1 ξ i? ǫ J. Saketha Nath (PAKDD 09) Conference Seminar 7 / 15

18 Convex Relaxation Comments: In general, difficult to solve such CCPs. Construct an efficient relaxation: Employ Bernstein schemes to upper bound probability Constrain the upper-bound to be less than ǫ Key Question: Prob u ij X ij + u i0 0? j J. Saketha Nath (PAKDD 09) Conference Seminar 7 / 15

19 Bernstein Bounding Markov Bounding: Prob (X 0)? J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

20 Bernstein Bounding Markov Bounding: E X [1 X 0 ]? J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

21 Bernstein Bounding Markov Bounding: E X [1 X 0 ] E[exp {αx}] α 0 J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

22 Bernstein Bounding Markov Bounding: E X [1 X 0 ] E[exp {αx}] α 0 = E exp α u ij X ij + u i0 j = exp {u i0 } Π j E[exp {αu ij X ij }] J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

23 Bernstein Bounding Markov Bounding: E X [1 X 0 ] E[exp {αx}] α 0 = E exp α u ij X ij + u i0 j = exp {u i0 } Π j E[exp {αu ij X ij }] J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

24 Bernstein Bounding Markov Bounding: E X [1 X 0 ] E[exp {αx}] α 0 = E exp α u ij X ij + u i0 j = exp {u i0 } Π j E[exp {αu ij X ij }] Bounding Expectation: Given X R, E[X], tightly bound: E[exp {tx}], t R J. Saketha Nath (PAKDD 09) Conference Seminar 8 / 15

25 Bernstein Bounding Contd. Known Result: { } µij t + σ(ˆµ ij ) 2 lij 2 E[exp{tX ij }] exp t 2 2 t R (1) J. Saketha Nath (PAKDD 09) Conference Seminar 9 / 15

26 Bernstein Bounding Contd. Known Result: { } µij t + σ(ˆµ ij ) 2 lij 2 E[exp{tX ij }] exp t 2 2 t R (1) Analogous with Gaussian mgf Variance term varies with relative position of mean! J. Saketha Nath (PAKDD 09) Conference Seminar 9 / 15

27 Bernstein Bounding Contd. Known Result: { } µij t + σ(ˆµ ij ) 2 lij 2 E[exp{tX ij }] exp t 2 2 t R (1) Analogous with Gaussian mgf Variance term varies with relative position of mean! Proof Sketch: Support (a X b), mean are known. exp{tx} b X X a b a exp{ta} + b a exp{tb} Taking expectations on both sides leads to: { } a + b E[exp {tx}] exp 2 t + h(lt), h(z) log (cosh(z) + ˆµ sinh(z)) { } exp µt + σ (ˆµ)2 l 2 t 2 2 institution-logo J. Saketha Nath (PAKDD 09) Conference Seminar 9 / 15

28 Main Result A Convex Formulation IC-MBH Formulation: min w,b,z i,ξ i w C i ξ i s.t. y i (w µ i b) + z i ˆµ i 1 ξ i + z i 1 + κ Σ i (y i L i w + z i ) 2 J. Saketha Nath (PAKDD 09) Conference Seminar 10 / 15

29 Geometric Interpretation x 2 axis x axis 1 Figure: Figure showing bounding hyper-rectangle and uncertainty sets for different positions of mean. Mean and boundary of uncertainty set marked with same color. J. Saketha Nath (PAKDD 09) Conference Seminar 11 / 15 institution-logo

30 Classification of Uncertain Datapoints Labeling: Support y pr = sign(w m i b) Mean y pr = sign(w µ i b) Replicates y pr is majority label of replicates J. Saketha Nath (PAKDD 09) Conference Seminar 12 / 15

31 Classification of Uncertain Datapoints Labeling: Support y pr = sign(w m i b) Mean y pr = sign(w µ i b) Replicates y pr is majority label of replicates Error Measures: Nominal Error Calculate ǫ opt from Bernstein bounding 1 if y i y pr i OptErr i = ǫ opt if y i = y pr i and R(a i,b i ) cuts opt. hyp. 0 else (2) J. Saketha Nath (PAKDD 09) Conference Seminar 12 / 15

32 Numerical Experiments Table: Table comparing NomErr (NE) and OptErr (OE) obtained with IC-M, IC-R, IC-BH and IC-MBH. Data IC-M IC-R IC-BH IC-MBH NE OE NE OE NE OE NE OE 10 U β A-F A-S A-T F-S F-T S-T WDBC J. Saketha Nath (PAKDD 09) Conference Seminar 13 / 15

33 Conclusions Novel methodology for interval-valued data classification under partial information. Employs support as well as statistical information Idea pose the problem as CCP and relax using Bernstein bounds Bernstein bounds lead to less conservative noise modeling Better classification margin and generalization ability Empirical results show 50% decrease in generalization error Exploitation of Bernstein bounding techniques in learning has a promise. J. Saketha Nath (PAKDD 09) Conference Seminar 14 / 15

34 THANK YOU J. Saketha Nath (PAKDD 09) Conference Seminar 15 / 15

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)