Max-Margin Additive Classifiers for Detection

Size: px

Start display at page:

Download "Max-Margin Additive Classifiers for Detection"

Pauline Stevenson
5 years ago
Views:

1 Max-Margin Additive Classifiers for Detection Subhransu Maji and Alexander C. Berg Sam Hare VGG Reading Group October 30, 2009

2 Introduction CVPR08: SVMs with additive kernels can be evaluated efficiently. This work: SVMs with additive kernels can be trained efficiently.

3 Additive classifiers Classifiers of the form: sign(f (x)) where f (x) = i f i (x i ) Sum of coordinate functions f i. An SVM with an additive kernel is an additive classifier.

4 Additive kernels When k(x, y) = i k i(x i, y i ), SVM decision function becomes: h(x) = j = j = i = i α j k(x, v j )+b α j k i (x i,v j i )+b i α j k i (x i,v j i )+b j f i (x i )+b CVPR08: Approximate each f i as piecewise constant or piecewise linear. Cost of evaluating decision function now independent of number of support vectors (like for a linear SVM).

5 Learning f i directly Previously, kernelised SVM trained as usual, and then f i approximated. Now want to learn f i directly. Assume parametric decision function f w (x) = i f w i i (x i ). Encode parameters w as ŵ and data points x as ˆx such that f w (x) ŵ T ˆx. Solve for optimal decision function in large-margin sense: f w = argmin f w { } R(ŵ) + 1 max(0, 1 y k ŵ T ˆx k ) n Basically a standard linear SVM objective (depending on form of R). k

6 Learning f i directly (cont.) When f w is additive and f w i i are a linear combination of a finite number of basis functions, ŵ = w. f w (x) = i = i w i,j g i,j (x i ) j w T i ˆx i = w T ˆx

7 Specific case: SVM with histogram intersection kernel Given support vectors {v j } and coefficients {α j }, coordinate function is: f i (x i ) = j α j min(x i, v j i ) By encoding min(x, y) φ(x) T φ(y), can rewrite f i in desired form: where f i (x i ) w T i φ(x i ) w i = j α j φ(v j i )

Encoding min(x, y) φ(x) T φ(y) Two options proposed in paper (assume x [0, 1]): φ 1 (x) = 1 N U(R(Nx)) where e.g. U(3) = (1, 1, 1, 0, 0, 0), R is a rounding function, and N determines discretisation scale.

8 Encoding min(x, y) φ(x) T φ(y) Two options proposed in paper (assume x [0, 1]): φ 1 (x) = 1 N U(R(Nx)) where e.g. U(3) = (1, 1, 1, 0, 0, 0), R is a rounding function, and N determines discretisation scale. φ 2 (x) = 1 N U (Nx) where e.g. U (3.5) = (1, 1, 1, 0.5, 0, 0), i.e. no rounding. Figure 1. From left to right min(x, y), φ 1(x)φ 1(y) and φ 2(x)φ 2(y) with N = 10. Note that the φ 2 encoding is very close to min. Quality of min approximation with φ 2 is much better.

9 What I don t get... φ serves two purposes: Approximates min. Determines form of f i, since f i (x i ) = wi T φ(x i ). φ 1 causes f i to be piecewise constant, and φ 2 causes f i to be piecewise linear, both with uniformly spaced breaks. These were the same approximations for fi considered in CVPR08 work. Presumably this is intentional, but paper didn t really make the connection clear.

10 Sparsity In principle, could now apply φ to training data and use off-the-shelf linear SVM solver. But impractical as encoding dense. Modify encoding slightly (see details in paper) to give φ s 1 and φ s 2, with only 1 or 2 nonzero values respectively. Also need to modify encoding of w such that w s T φ s (x) = w T φ(x). With w s, regularisation becomes non-standard, so custom solver (variant of PEGASOS) developed.

11 Results: Caltech examples 30 examples Encoding Training Algorithm Training Time(s) Accuracy(%) Training Time(s) Accuracy(%) identity LIBLINEAR (0.87) (0.94) (0.80) (1.33) identity LIBSVM (int kernel) (2.10) (0.61) (4.30) (0.78) snow=φ 1 LIBLINEAR (1.17) (0.58) (0.93) (1.02) φ 2 LIBLINEAR (1.43) (0.61) (1.09) (1.24) φ 2 PWLSGD (2.49) (0.45) (1.98) (0.72)

12 Results: Daimler Chrysler pedestrians Encoding Training Algorithm Training Time(s) Accuracy(%) identity LIBLINEAR 1.89 (00.10) (4.44) identity LIBSVM (int. kernel) (27.85) (1.42) snow=φ 1 LIBLINEAR 2.98 (00.33) (1.43) φ 2 LIBLINEAR 1.86 (00.04) (1.62) φ 2 PWLSGD 3.18 (00.01) (1.58)

13 Results: INRIA pedestrians Encoding Training Algorithm Training Time (HOG) Training Time (sphog) identity LIBLINEAR s identity LIBSVM (lin. lernel) >180 min 140 min identity LIBSVM (int. kernel) >180 min 148 min Encoding snow=φ 1 Training LIBLINEAR Algorithm Training 35.52s Time (HOG) Training sTime (sphog) identity φ 2 LIBLINEAR 22.45s s20.12s identity φ 2 LIBSVM PWLSGD (lin. lernel) 99.85s >180 min 76.12s 140 min Table 3. (HOG) 47, 327 features of 3780 dimension. Encoding Time 87.22s. DalalandTriggsuseamodifiedSVMLIGHTwhichisfaster identity LIBSVM (int. kernel) >180 min 148 min than LIBSVM, but stilltakes several minutes to train, slower thanourpwlsgdonφ2 encoding which produces both better classification using either HOG snow=φ or sphog 1 (below) LIBLINEAR and better detection (Fig. 2using35.52s sphog).(liblinearfailedto s trainontherawhogdata) (sphog) :Training38, φ featuresliblinear of 2268 dimension using PWLSGD22.45s on the φ2 encoding takes only 26.76s about 1% of the time taken to train an intersectionφkernel 2 SVM using PWLSGD LIBSVM, and performs as well for99.85s classification (see below) s 1 INRIA Pedestrians, Hard Training Data (HOG) 1 INRIA Pedestrians, Hard Training Data (sphog) INRIA Pedestrian Detections Detection Rate lin. + LIBSVM int. + LIBSVM φ 1, LIBLINEAR φ 2, LIBLINEAR φ 2, PWLSGD False Positive Rate Detection Rate lin. + LIBSVM int. + LIBSVM φ 1, LIBLINEAR φ 2, LIBLINEAR φ 2, PWLSGD False Positive Rate Detection Rate Detection Rate INRIA Pedestrian Detections sphog + φ 2 + PWLSGD sphog + (min) LIBSVM HOG + (lin) LIBSVM False Pos Per Image Figure 2. Left & Middle show classification results on the INRIA Pedestrian Hard Training Data (see Sec. 6.3 ). Left plot: HOG features and middle plot: sphog features. The HOG features were designed for a linear classifier, but there is still a small advantage to using the IKSVM (int+ LIBSVM) or PWLSGD over linear. The sphog features show a larger improvement from using IKSVM or PWLSGD, and have slightly higher overall performance. Right shows Detection plots on the INRIA benchmark. We compare our detector, sphog features with 100 IKSVMtimes or PWLSGD, faster to Dalal and training Triggs, HOG + than (lin.) LIBSVM. CVPR08. All the detectors are run at a stride of 8 8 pixels, and scaleratio of 2 1/8.Thecorrectdetectioncriteriaisratioofboundingboxintersection to union above 50%. sphog + (min) LIBSVM HOG + (lin) LIBSVM sphog + φ 2 + PWLSGD False Pos Per Image Figure 2. Left & Middle show classification results on the INRIA Pedestrian Hard Training Data (see Sec. 6.3 ). Left plot: HOG features

Workshop on Web- scale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers

Workshop on Web- scale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers Worshop on Web- scale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers Subhransu Maji Research Assistant Professor Toyota Technological Institute at