Supplemental Material: Probability Models for Open Set Recognition

Size: px

Start display at page:

Download "Supplemental Material: Probability Models for Open Set Recognition"

Colin Simon
5 years ago
Views:

1 Supplemental Material: Probability Models for Open Set Recognition A Results for Multiclass Open Set Recognition Accuracy The main article presents Fmeasure plots, but reviewers may be interested in accuracy plots for comparison. Accuracy is a common choice for evaluating binary decision classifiers. However, it is not always the best choice because it tends to underemphasize the distinction between correct positive and negative classification. Precisely defined, accuracy refers to the correctly classified samples (true positives T P and true negatives T N) out of all of the classification decisions (T P, T N, false positives F P, and false negatives F N). Accuracy = T P T N T P T N F P F N For open set accuracy, we consider a correct response to be either the correct class label or rejection if the test sample is from a class not used in training. Fig. depicts results for OLETTER, while Fig. depicts results for OMNIST. Comparing these plots with Figs. and 4 in the article, the same trends are evident () Accuracy % % 4% 6% 8% Openness 0% % 4% W SVM W SVM δ R =. MASCAP MAS NNCAP NN vs All Mult. RBF PlaB Pairwise Mult. RBF PlaB vs All Mult. RBF Pairwise Mult. RBF LogisJc Regression Figure. Accuracy for open set multiclass recognition on OLETTER. Common multiclass SVMs and pairwise SVM with Platt probabilities and a rejection option degrade quickly (vsall Multi. RBF, Pairwise Mult. RBF, and MAS are all comparable and visually overlap). The MAS algorithm with a CAP model, Nearest Neighbor algorithm with a CAP model, and the vsall SVM with Platt probabilities and a rejection option do a bit better, but still degrade much more than the WSVM. Error bars reflect standard deviation. For a detailed discussion of accuracy in the context of detection, please see Sec. 5 of our article Towards Open Set Recognition, IEEE TPAMI, vol. 6, no. 7, pp

2 Accuracy % % 4% 6% Openness 8% 0% % W SVM W SVM δ R =. MASCAP MAS NNCAP NN vs All Mult. RBF PlaB Pairwise Mult. RBF PlaB vs All Mult. RBF Pairwise Mult. RBF LogisJc Regression Figure. Accuracy for open set multiclass recognition on OMNIST. WSVM again maintains high Fmeasure scores as the problem grows to be more open, but common multiclass SVMs and existing thresholded probability estimators degrade quickly. Nearest neighbor with a CAP model and MAS with a CAP model are again better than their baselines. All algorithms except the WSVM, NNCAP, MASCAP, and Logistic Regression are comparable and visually overlap. Error bars reflect standard deviation. B Algorithmic Pseudocode for WSVM Alg. provides a precise description of the WSVM EVT probability modeling for each of the k classes considered in multiclass recognition. Each class y i in the multiclass problem is represented individually as the combination of a oneclass RBF SVM and a vsall binary RBF SVM. It assumes that all of the individual oneclass SVMs and vsall binary SVMs have been trained a priori. We use (and refer to) functions in the open source libmr library, which is available from We estimate the tail size q i (number of extrema) based on the number of support vectors for the associated SVMs, using 0.5 times the number of support vectors for our detection task, which has very high openness and hence needs strong rejection, and.5 times the number of support vectors for our multiclass recognition task, which is more permissive at a lower level of openness. For the binary SVMs, parameters are estimated for both the nonmatch (where we fit a Reverse Weibull) and the match (where we fit a Weibull) distributions. Note that in doing so, the match data for one class will be considered nonmatch data for another. The sets of decision scores d i, d i and d o i are collected from the nonmatch and the match scores for class y i. We use libmr to compute the parameters. We note the match scores are effectively transformed in the library, so that larger values are the points closer to the margin. In steps 5, 6 and 7, we call off to the library function FitSVM to estimate the key Reverse Weibull or Weibull parameters. The first parameter is the set of scores, the second is the class label, the third indicates if the class of interest should have positive scores, the fourth is a relatively selfevident enum indicating the type of model, and the fifth is the tail size to use in fitting. Alg. gives a description of the WSVM probability estimation for a new test sample. The function computes ι, P η and P ψ for each class. Given these values one can use Eq. 9 from the paper to make a final decision. In terms of computational costs, the EVT fittings and probability estimations are very fast, taking only milliseconds for each sample. Compared to using Plattstyle Sigmoids and crossvalidation for parameter estimation, the WSVM is much faster. Source code for the WSVM (provided as a patch to the popular LIBSVM package) will be released after this article has been published.

3 Algorithm WSVM MultiClass Model Fitting Require: Classes y i, i= k; Pretrained vsall binary SVM for each class y i, with score functions f i and support vectors α i, α i ; Pretrained oneclass SVM for each class y i, with score functions f o i and support vectors αo i ; Labeled training data X i ; Tail size multiplier θ = 0.5 for detection, or θ =.5 for multiclass recognition vsall binary SVM scores s i,j = f i (x j ), x j X i ; Oneclass SVM scores o i,j = f o i (x j), x j X i ; MetaRecognition object MR from libmr, providing MLEbased Weibull fitting and score calibration. for i = k do. Let q i = θ α i, q i = θ α i, qo i = θ αo i. Let d i be the set of binary nonmatch scores: s i,j = f i (x j ) when x j does not belong to y i. Let d i be the set of binary match scores: s i,j = f i (x j ) when x j belongs to y i 4. Let d o i be the set of oneclass scores: o i,j = f o i (x j) when x j belongs to y i 5. Let [ν ψ,i, λ ψ,i, κ ψ,i ] = MR.fitSVM(d i, y i, false, MetaRecognition::complement model, q i ) 6. Let [ν η,i, λ η,i, κ η,i ] = MR.fitSVM(d i, y i, true, MetaRecognition::positive model, q i ) 7. Let [ν o,i, λ o,i, κ o,i ] = MR.fitSVM(d o i, y i, false, MetaRecognition::positive model, q o i ) end for return W = [ν η,i, λ η,i, κ η,i, ν ψ,i, λ ψ,i, κ ψ,i, ν o,i, λ o,i, κ o,i ] Algorithm WSVM MultiClass Probability Estimation Require: Classes y i, i= k; Pretrained vsall binary SVM for each class y i, with score function f i ; Pretrained oneclass SVM for each class y i, with score functions f o i ; MetaRecognition object MR from libmr and parameter set W. function WSVMPREDICT(test sample x) for i = k do ι i = 0 if x > ν o,i then P o,i = MR.W score(f o i (x); ν o,i, κ o,i, λ o,i ) = if P o,i > 0.00 then ι i = ; else P o,i = 0 if x > ν η,i then P η,i = MR.W score(f i (x); ν η,i, κ η,i, λ η,i ) = else P η,i = 0 if x > ν ψ,i then P ψ,i = MR.W score(f i (x); ν ψ,i, κ ψ,i, λ ψ,i ) else P ψ,i = 0 end for return [P η,, P ψ,, ι..., P η,k, P ψ,k, ι k ] end function ( e ( ) f o κo,i ) i (x) ν o,i λ o,i ( e ( f i(x) ν η,i λ η,i ) κη,i ) ( e ( fi(x) ν ψ,i λ ψ,i ) κψ,i )

4 Density of! nonmatch scores Density of! match scores Density of! match scores C StepbyStep Examples for WSVM Training and Testing In order to give the reader a more intuitive look at the operation of the WSVM algorithm, we include a series of illustrations for three examples: training, testing when the input comes from a known class, and testing when the input comes from an unknown class. These illustrations correspond to the description of the WSVM algorithm provided in Sec. IV of the main article, and the pseudocode found above in this supplemental material. Step. Train a Oneclass SVM f o Step. Fit Weibull Distribution Over Tail of Scores from f o Class Label = Weibull Fit to Match Data RBF oneclass SVM yields a CAP model Class... λo, νo, κo δτ =0.00 Step. Train a Binary RBF SVM f Step 4. Fit EVT Distributions Over Tails of Scores from f Class Label = Known Negative Classes = 0,,... Reverse Weibull Fit to! NonMatch Data λψ, νψ, κψ Known Other Negative known classes Classes ( 0, =, 0, ), SVM Decision! Boundary δr = 0.5 openness Weibull Fit to! Match Data λη, νη, κη Class... Figure. WSVM Training. For simplicity, we show the procedure for a single class (the same steps are repeated for k known classes during training). In steps and, we train a oneclass SVM f o and fit a Weibull distribution to the corresponding tail of scores from that SVM for samples of data known at training time. This gives us a Weibull model of the data with parameters λ o, ν o, κ o. We set the oneclass probability threshold δ τ to In steps and 4, we train a binary SVM f and fit two EVT distributions to the corresponding scores from that SVM for samples of data known at training time. A Reverse Weibull distribution is fit to the tail of the nonmatch data (scores for known classes other than ), giving us a model with parameters λ η, ν η, κ η. A Weibull distribution is fit to the tail of the match data (scores for class ), giving us a model with parameters λ ψ, ν ψ, κ ψ. We set the rejection threshold δ R for the overall WSVM to 0.5 openness (see Eq. 0 in the paper for a definition of openness). The collection of SVM models, EVT distribution parameters, and thresholds constitute the WSVM.

5 Step. Apply Oneclass SVM CAP Model for All Known Classes Input: x = f o (x) = s0 0 f o (x) = s f o (x) = s f o (x) = s Step. Test Probabilities Po(0 x) < δτ, ι0 = 0; Po( x) < δτ, ι = 0; Po( x) > δτ, ι = ; Po( x) > δτ, ι = Step 5. Normalize all Binary SVM Scores Using EVT Match and Nonmatch Models λη,, νη,, κη, Pη, λη,, νη,, κη, s s λψ,, νψ,, κψ, Pψ, Apply CDFs per class for each score Pη, λψ,, νψ,, κψ, Pψ, Step. Normalize All Oneclass SVM Scores Using EVT Models λo,0, νo,0, κo,0 s0 λo,, νo,, κo, s Step 4. Apply Binary SVMs Step 6. Fuse and Test Probabilities Pη,0(x) Pψ,0(x) ι0 = 0 < δr Pη,(x) Pψ,(x) ι = 0 < δr Pη,(x) Pψ,(x) ι = 0.00 < δr Pη,(x) Pψ,(x) ι = > δr Monotonically decreasing prob. bound Threshold on prob. x λo,, νo,, κo, s Apply CDF for each class to each score λo,, νo,, κo, s f (x) = s f (x) = s Models for class and the data point for this example CAP thresholded region Prob. from kernel machine varies locally with distance to training points P( ) > δr Probability model for test instance: Po WSVM thresholded region Figure 4. WSVM testing when the input comes from a known class ( ). Classes marked in red indicate rejection, and those marked in green indicate acceptance. The first stage of the algorithm (steps ) applies the oneclass SVM CAP models from each class to the input data, normalizes the resulting scores to probabilities, and then tests each probability against the threshold δ τ. The indicator variable ι y is set to if the probability exceeds δ τ, and 0 otherwise (thus eliminating those classes as possibilities later in step 6). The second stage of the algorithm (steps 4 6) applies the binary SVMs from each class to the same input data, normalizes the resulting scores to two probability estimates per class (one for the match model and another for the nonmatch model), and then fuses and tests the probability estimates. Any fused probability exceeding the WSVM rejection threshold δ R indicates acceptance for a class. Note for brevity, we only show the process for classes and in steps 4 and 5 (short circuit evaluation is possible using the indicator variables ι 0 and ι ; an indicator variable with a value of 0 always results in a fused probability of 0). The bottom panel on the right shows where the data point for this example falls with respect to the CAP thresholded region (defined by the oneclass SVM) and the overall WSVM thresholded region for the class.

6 Step. Apply Oneclass SVM CAP Model for All Known Classes Input: x = Q f o (x) = s0 0 f o (x) = s f o (x) = s f o (x) = s Step. Test Probabilities Po(0 x) < δτ, ι0 = 0; Po( x) < δτ, ι = 0; Po( x) < δτ, ι = 0; Po( x) < δτ, ι = 0 Step 4. Apply Indicator Variables to Binary SVMs Pη,0(x) Pψ,0(x) ι0 = 0 < δr Step. Normalize all Oneclass SVM Scores using EVT models λo,0, νo,0, κo,0 s0 λo,, νo,, κo, Monotonically decreasing prob. bound Threshold on prob. λo,, νo,, κo, s Apply CDF for each class to each score s λo,, νo,, κo, Models for class and the data point for this example s Prob. from kernel machine varies locally with distance to training points Probability model for test instance: Po Pη,(x) Pψ,(x) ι = 0 < δr Pη,(x) Pψ,(x) ι = 0 < δr Pη,(x) Pψ,(x) ι = 0 < δr P( Q) = 0 x WSVM thresholded region CAP thresholded region Figure 5. WSVM testing when the input comes from an unknown class ( Q ). Classes marked in red indicate rejection. The procedure follows the same process depicted in Fig. 4. In this example, we show what happens when a data point falls outside the CAP thresholded region (i.e. exists in open space) for each known class. The CAP models for the knowns classes do not produce probabilities that exceed the threshold δ τ, thus all of the indicator variables for the known classes (ι 0, ι, ι, ι ) are set to 0, resulting in rejection for each class. The bottom panel on the right shows where the data point for this example falls with respect to the CAP thresholded region (defined by the oneclass SVM) and the overall WSVM thresholded region for the class.

Pre-print of article that will appear in T-PAMI.

Pre-print of article that will appear in T-PAMI. 204 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising