K-HYPERPLANE HINGE- MINIMAX CLASSIFIER. Margarita Osadchy, Tamir Hazan, Daniel Keren University of Haifa

Size: px

Start display at page:

Download "K-HYPERPLANE HINGE- MINIMAX CLASSIFIER. Margarita Osadchy, Tamir Hazan, Daniel Keren University of Haifa"

Carmel Jacobs
5 years ago
Views:

1 K-HYPERPANE HINGE- MINIMAX CASSIFIER Margarita Osadchy amir Hazan aniel Keren University of Haifa

2 Goal Non-linear binary classifier Imbalanced data sets Fast Scalable Natural Applications Object etection Fraud etection

3 Relatively small number of samples Infinitely many training samples SVM: the zero-one loss is upper bounded by the hinge loss. n n i ma0 y i [Vapnik 000 Zhang 00 Bartlett & Mendelson 003 Bousquet et al. 004 Kalade et al 009] i Minima: the zero-one risk is upper bounded by the orstcase risk among all data distributions Z sup Pr z 0 z~ Z [anckriet et al. 003; Honorio & Jaakkola 04]

4 Imbalanced problems Hinge Risk Minima Negative set - very large number of samples Positive set small number of samples Hinge Minima Classifier

5 Hinge-Minima inear Classifier }] [ma{0 0/ y E Hinge risk bound: H Hinge risk bound for the positive class: + }] [ma{0 H E Minima risk bound for the negative class: - 0 Pr sup ~ Z M ] [ ] [ ~ ~ y y E E 0 Pr sup ~ 0/ z Z z z z Minima risk bound: M z z ] [ ] [ ~ ~ z z y z y z y z E z E } { ~ sign y y R y d ]] [ [ 0/ y y E z y M H MH

6 MH H M ECCV 0 Applied linear classifier to vision problems. Generalized to kernel classifier ith a fied number of support vectors. Faster than SVM But computing non-linear support vectors is still epensive

7 Intersection of K Hyperplanes f if otherise 0 Klivans & Servedio 004; Arriaga & Vempala 999 Computationally costly for a large number of negative eamples

8 Intersection of K Hyperplanes otherise 0 if f 0/ M H 0 Pr sup ~ Z M K i i H } ma{0 E MH

9 M sup Pr 0 ~ Z? 0 is conve set et Zμ Σ be all distributions ith knon mean μ and covariance Σ. For K fied hypeplanes i... K Marshall & Olkin 960 i sup Pr 0 Z ~ d d inf 0.

10 Eample 3 3 * * arg min. ~ is a matri ith columns that satisfy * 0 ~

11 M sup Pr 0 ~ Z? et Zμ Σ be all distributions ith knon mean μ and covariance Σ. For K fied hypeplanes i... K sup Pr 0 Z ~ d * et arg min and ~ be a matri ith columns that satisfy * 0 e shoed that d d i inf 0. ~ ~ ~ ~ d

12 Epected Risk for y=- 0 Pr sup ~ Z M ~ ~ ~ ~

13 Uniform Generalization Bound e shoed: Confidence MH O K H M log/ S S m Empirical estimation of H Empirical estimation of M raining set size

14 Proof Sketch. Sho. Sho m K O H S H log/ Etension of the Rademacher compleity to d K R m K O M S M log/ Sho for K= Follos the same steps for K>

15 For K= ˆ ˆ ˆ ˆ ˆ ˆ ˆ e sho: ˆ ˆ ˆ α bounds the minimal eigenvalue of Σ

16 For K= cont. Using Bernstein inequality for vectors [Gross 0Candes & Plan 0] for 3 log/ mˆ / 4 3log/ mˆ / 4 mˆ is the number of negative eamples

17 For K= cont. M S mˆ m sup ~ Z ˆ ˆ Pr 0 Estimate E y~ [ y ] by its empirical mean m ˆ / m and use Hoeffding inequality to bound its deviation from. Combining all together: ith the probability error of 3δ for m log M M s c 3 log/ / 4 m log/ m c / /

18 Algorithm Minimizes the bound Conve optimization for K= min K ma{0 } sup Pr 0 i i ~ Z ˆ ˆ Approimation Algorithm Find K hyperplanes in a greedy ay using conve optimization for a single hyperlane. Iteratively refine the K hyperplanes. Hard to compute

19 Approimation Algorithm- Greedy Phase

20 Approimation Algorithm- Greedy step

21 Approimation Algorithm- Greedy step

22 Approimation Algorithm- Greedy step

23 Approimation Algorithm Greedy phase K=5: K= K= K=3 K=4 K=5

24 Approimation Algorithm Refinement phase Keep K- hyperlanes fied find the Kth hyperplane

25 Approimation Iterative Algorithm Refinement phase Keep K- hyperlanes fied find the Kth hyperplane

26 Approimation Iterative Algorithm Refinement phase Keep K- hyperlanes fied find the Kth hyperplane

27 Approimation Algorithm K=5 Refinement phase 5 iterations

28 Eperiments: Synthetic ata est the robustness to imbalance in data d points equally partitioned into train validation and test

29 Eperiments: etters ata Set UCI Machine earning Repository [Murphy & Aha994] 6-dimensional feature 6 letters in the English alphabet.

30 Eperiments: Scene Recognition 397 categories of the SUN data base [Xiao et al. 00]. Features: BO of dense HOG ith 300 ords he data is divided into 50 training and 50 test images in 0 folds.

31 QUESIONS HANK YOU!

Max-Margin Ratio Machine

JMLR: Workshop and Conference Proceedings 25:1 13, 2012 Asian Conference on Machine Learning Max-Margin Ratio Machine Suicheng Gu and Yuhong Guo Department of Computer and Information Sciences Temple University,