arxiv: v1 [cs.lg] 23 Aug 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.lg] 23 Aug 2018"

Transcription

1 Muticass Universum SVM Sauptik Dhar 1 Vadimir Cherkassky 2 Mohak Shah 1 3 arxiv: v1 [cs.lg] 23 Aug 2018 Abstract We introduce Universum earning for muticass probems and propose a nove formuation for muticass universum SVM (MU-SVM). We aso propose an anaytic span bound for mode seection with 2 4 faster computation times than standard resamping techniques. We empiricay demonstrate the efficacy of the proposed MU- SVM formuation on severa rea word datasets achieving > 20% improvement in test accuracies compared to muti-cass SVM. 1. Introduction Many appications of machine earning invove anaysis of sparse high-dimensiona data, where the number of input features is arger than the number of data sampes. Such settings are typicay seen in severa rea ife appications in domains such as, heathcare, autonomous driving, prognostics and heath management etc. (Cherkassky & Muier, 2007). Such high-dimensiona data sets present new chaenges for most earning probems. Nove data intensive deep architectures are naturay not suited for such scenarios (Goodfeow et a., 2016). Recent studies have shown Universum earning to be particuary effective for such high-dimensiona ow sampe size data settings (Sinz et a., 2008; Chen & Zhang, 2009; Dhar & Cherkassky, 2015; Lu & Tong, 2014; Qi et a., 2014; Shen et a., 2012; Wang et a., 2014; Zhang et a., 2008; Xu et a., 2015; 2016; Zhu, 2016; Chen et a., 2017; Dhar & Cherkassky, 2017). However, most such studies are imited to binary cassification probems. On the other hand, many practica appications invove cassification of more than two categories. In order to incorporate a priori knowedge (in the form of universum data) for such appications, there is a need to extend universum earning for muticass probems. In this paper we focus on formuating the universum earning for muticass SVM under baanced settings with equa miscassification costs. Researchers have proposed severa 1 LG Siicon Vaey Lab, Santa Cara, CA, USA. 2 University of Minnesota, MN, USA. 3 University of Iinois at Chicago, IL, USA. Correspondence to: Sauptik Dhar <sauptik.dhar@gmai.com>. methods to sove a muticass SVM probem. Typicay these methods foow two basic approaches (Hsu & Lin, 2002; Wang & Xue, 2014). The first approach foows an Error Correcting Output Code (ECOC) based setting (Dietterich & Bakiri, 1995), where severa binary cassifiers are combined to sove the muticass probem viz., one-vs-one, one-vs-a, directed acycic graph SVM (Patt et a., 1999). Previous works, such as (Sinz, 2007; Chen & Zhang, 2009) which foow this setting, focus on the binary universum earning paradigm and ony provide some hints for their extensions to the muticass probems. An aternative to the ECOC based setting is the direct approach, where the entire muticass probem is soved through a singe arger optimization formuation (Vapnik, 1998; Crammer & Singer, 2002; Weston & Watkins, 1998). Recenty, (Zhang & Le- Cun, 2017) adopted such a direct approach for universum earning under a probabiistic framework using a ogistic oss function. This paper aso adopts such a direct approach, but proposes an aternate universum earning framework that utiizes an SVM ike oss function foowing (Crammer & Singer, 2002), and introduces the Muticass Universum SVM (MU-SVM) formuation. The proposed framework aows for: a) an efficient impementation for MU-SVM using existing muticass SVM sovers (Section 3.2), and b) deriving practica anaytic error bounds for mode seection (Section 3.3). Further, compared to ECOC based approaches, we provide a unified framework for muticass earning under universum settings, with simiar (or better) performance accuracies (see Appendix B.1). The main contributions of this paper are as foows: 1. We formaize the notion of universum earning for SVM under muticass settings, and propose a nove direct formuation caed Muticass Universum SVM (MU-SVM) (in Section 3.1). The proposed MU-SVM formuation has the neat property that it reduces to: i) standard (C&S) muticass SVM in absence of universum data and ii) binary U-SVM formuation (Weston et a., 2006) for two-cass probems (Section 3.1, Proposition 1). This consoidates the propriety of MU- SVM as the apt extension for muticass SVM under universum settings. 2. The proposed formuation has a desirabe structure that renders the MU-SVM formuation sovabe through

2 Muticass Universum SVM any state-of-art muticass SVM sovers (Section 3.2, Proposition 2). 3. We provide a new Span definition for muticass formuations, and derive a eave-one-out bound for MU-SVM (Section 3.3, Theorem 1). Under additiona assumptions, we provide a computationay efficient version of the eave-one-out error bound (Section 3.3, Theorem 2), which presents a practica mechanism for mode seection. 4. Empirica resuts are provided in support of the proposed strategy (Section 4) Finay, concusions are presented in Section 5. Note that, a shorter version of this work is avaiabe in (Dhar et a., 2016). Compared to (Dhar et a., 2016), this paper incudes additiona proofs and resuts as highighted beow, This paper provides the new Propositions 1, 2 & 3. We provide a new eave-one-out bound in Theorem 1 without any assumptions. Under the assumptions in Section 3.3, we provide a stricter eave-one-out error bound, which hods for both Type 1 & 2 support vectors. Exhaustive resuts for a the caims are provided for additiona data sets. 2. Muticass SVM This section provides a brief description of the muticass SVM formuation foowing (Crammer & Singer, 2002). Given i.i.d training sampes (x i, y i ) n i=1, with x R d and y {1,..., L} ; where n = number of training sampes, Figure 1: Loss function for muticass SVM with f k (x) = d = dimensionaity wk x. A sampe (x, y = k) of the input space and L ying inside the margin is penaized = tota number of casses, the task of a muticass ineary using the sack variabe ξ. cassifier is to estimate a vector vaued function f = [f 1,..., f L ] for predicting the cass abes for future unseen sampes (x, y) using the decision rue ŷ = argmax f (x). The C&S muticass =1,...,L SVM is a widey used formuation which gener- aizes the concept of arge margin cassifier for muticass probems. This muticass SVM setting empoys a specia margin-based oss (simiar to the hinge oss), L(y, f(x)) = [max(f (x) + 1 δ y ) f y (x)] + where { 1; y = [a] + = max(0, a) and δ y = (see Fig 1). 0; y Here, for any sampe (x, y = k), having L(y, f(x)) = 0 ensures a margin-distance of +1 for the correct prediction i.e. f k (x) f (x) 1; k. The SVM muticass formuation (for inear parameterization) is provided beow: min w 1...w L,ξ 1 2 L w C =1 n ξ i (1) i=1 s.t. (w yi w ) x i e i ξ i ; e i = 1 δ i i = 1... n, = 1... L here, f (x) = w x. Note that training sampes faing inside the margin border ( +1 ) are ineary penaized using the sack variabes ξ i 0, i = 1... n (as shown in Fig 1). These sack variabes contribute to the empirica risk for the muticass SVM formuation R emp (w) = n i=1 ξ i. The SVM 1 formuation attempts to strike a baance between minimization of the empirica risk and the reguarization term. This is controed through the user-defined parameter C Muticass Universum SVM 3.1. Muticass U-SVM formuation The idea of Universum earning was introduced by (Vapnik, 1998; 2006) to incorporate a priori knowedge about admissibe data sampes. The Universum earning was introduced for binary cassification, where in addition to abeed training data we are aso given a set of unabeed exampes from the Universum. The Universum contains data Figure 2: Loss function for universum sampes x for k th cass decision boundary wk x max =1...L w x = 0. A sampe ying outside the - insensitive zone is penaized ineary using the sack variabe ζ k. that beongs to the same appication domain as the training data. However, these sampes are known not to beong to either cass. In fact, this idea can aso be extended to muticass probems. For muticass probems in addition to the abeed training data we are aso given a set of unabeed exampes from the Universum. These Universum sampes are known not to beong to any of the casses in the training 1 We refer to the C & S formuation in (1) as SVM throughout.

3 Muticass Universum SVM data. For exampe, if the goa of earning is to discriminate between handwritten digits 0, 1, 2,...,9; one can introduce additiona knowedge in the form of handwritten etters A, B, C,...,Z. These exampes from the Universum contain certain information about handwriting styes, but they cannot be assigned to any of the casses (0 to 9). Aso note that, Universum sampes do not have the same distribution as abeed training sampes. These unabeed Universum sampes are introduced into the earning as contradictions and hence shoud ie cose to the decision boundaries of a the casses 1... L. This argument foows from (Vapnik, 2006; Weston et a., 2006), where the universum sampes ying cose to the decision boundaries are more ikey to fasify the cassifier. To ensure this, we incorporate a - insensitive oss function for the universum sampes (shown in Fig 2). This - insensitive oss forces the universum sampes to ie cose to the decision boundaries ( 0 in Fig. 2). Note that, this idea of using a - insensitive oss for Universum sampes has been previousy introduced in (Weston et a., 2006) for binary cassification. However, different from (Weston et a., 2006), here the - insensitive oss is introduced for the decision boundary for a the casses i.e. w k x max =1...L w x = 0 ; k = 1... L. This reasoning motivates the new muticass Universum-SVM (MU-SVM) formuation where: Standard hinge oss is used for the training sampes (shown in Fig. 1). This oss forces the training sampes to ie outside the +1 margin border. The universum sampes are penaized by a - insensitive oss (see Fig. 2) for the decision functions of a the casses f = [f 1,..., f L ]. This eads to the MU-SVM formuation. Given training sampes T := (x i, y i ) n i=1, where y i {1,..., L} and additiona unabeed universum sampes U := (x i )m i =1. Sove2, min w 1...w L,ξ,ζ 1 2 L w C =1 n ξ i + C i=1 m L i =1 k=1 ζ i k s.t. i = 1... n, i = 1... m (2) (w yi w ) x i e i ξ i ; e i = 1 δ i, = 1... L (wk x i max =1...L w x i ) + ζ i k; { 1; yi = ζ i k 0, δ i = 0; y i k = 1... L Here, for the k th cass decision boundary the universum sampes (x i )m i =1 that ie outside the - insensitive zone are ineary penaized using the sack variabes ζ i k 0, i = 1... m. The user-defined parameters C, C 0 2 Throughout this paper, we use index i, j for training sampes, i for universum sampes and k, for the cass abes. contro the trade-off between the margin size, the error on training sampes, and the contradictions (sampes ying outside ± zone) on the universum sampes. Note that for C = 0 eq. (2) reduces to the muticass SVM cassifier. Proposition 1. For binary cassification L = 2, (2) reduces to the standard U-SVM formuation in (Weston et a., 2006) with w = w 1 w 2 and b = Computationa Impementation of MU-SVM This section describes computationa impementation of the MU-SVM formuation (2). Here, for every universum sampe x i we create L artificia sampes beonging to a the casses, i.e. (x i, y i 1 = 1),..., (x i, y i L = L) as beow, (x i, y i ) i = 1... n (x i, y i ) = (x i, y i ) i = n n + ml; i = 1... m; = 1... L e i = C i = e i i = 1... n; = 1... L (1 δ i ) i = n n + ml; i = 1... m; = 1... L (3) C C i = 1... n i = n n + ml; i = 1... m; = 1... L Proposition 2. Under transformation (3), the MU-SVM formuation in eq. (2) can be exacty soved using, min w 1...w L,ξ s.t. 1 2 L w =1 n+ml i=1 (w yi w ) x i e i ξ i i = 1... n + ml, = 1... L C i ξ i (4) The formuation (4) has the same form as (1) except that the former has additiona ml constraints for the universum sampes. Like most other SVM sovers, the MU-SVM formuation in (4) is aso soved in its dua form as shown in Agorithm 1 see (Hsu & Lin, 2002). Hence, the computationa compexity is same as soving a muticass SVM formuation (in (1)) with n + ml sampes. Most off-theshef muticass SVM sovers can be used for soving the proposed MU-SVM Mode Seection As presented in (5), the current MU-SVM agorithm has four tunabe parameters: C, C, kerne parameter, and. So in practice, muticass SVM may yied better resuts than MU-SVM, simpy because it has an inherenty simper mode seection. Successfu appication of the proposed MU-SVM heaviy depends on the optima tuning of its mode parameters. This paper adopts a simpified strategy

4 Muticass Universum SVM Agorithm 1 MU-SVM (dua form) 1. Given training (x i, y i ) n i=1 and universum (x i )m j=1 2. Transform (3) and sove (5), max α s.t. W (α) = 1 α i α j K(x i, x j ) α i e i 2 i,j i, α i = 0 (5) α i, C i if = y i ; α i, 0 if y i i, j = 1... n + ml, = 1... L 3. Obtain the cass abe using the foowing decision rue: ŷ = argmax α i K(x i, x) i for mode seection previousy used in (Cherkassky et a., 2011). This mainy invoves two steps, a. First, perform optima tuning of the C and kerne parameters for muticass SVM cassifier. This step equivaenty performs mode seection for the parameters specific ony to the training sampes in the MU-SVM formuation (2). b. Second, tune the parameter whie keeping C and kerne parameters fixed (as seected in Step a). Parameter C /C = n ml is kept fixed throughout the paper to ensure equa contribution of training and universum sampes in the optimization formuation. This strategy seects an MU-SVM soution (in step b) cose to a given SVM soution (seected in step a). The mode parameters in Steps (a) & are typicay seected through resamping techniques such as, eave one out (.o.o) or stratified cross-vaidation approaches (Japkowicz & Shah, 2011). Of these approaches,.o.o provides an amost unbiased estimate of the test error (Luntz, 1969; Schokopf & Smoa, 2001). However, on the downside it is very computationay intensive. In this paper, we propose a new anaytic bound for the eave-one-out error for MU-SVM formuation. The proposed bound can be used for mode seection in Steps (a) & and provides a computationa edge over standard resamping techniques. Detaied discussion regarding this new.o.o error bound is provided next. Note that, the.o.o formuation with the t th training sampe dropped is the same as in (5) with an additiona constraint α t = 0;. Then, the.o.o error is given as: R.o.o = 1 n n 1[y t ŷ t ], where ŷ t = arg max αi t K(x i, x t ) t=1 i is the predicted cass abe for the t th sampe and α t = [α11, t..., α1l t,..., αt1 t = 0,..., αtl t = 0,...] is the.o.o }{{}}{{} α t 1 α t t =0 soution. In this paper, we foow a strategy very simiar to the one used in (Vapnik & Chapee, 2000), and derive the new.o.o bound for the MU-SVM formuation in (5). The necessary prerequisites are presented next. Definition 1. (Support vector categories) 1. A support vector obtained from eq. (5) is caed a Type 1 support vector if 0 < α iyi < C i. This is represented as, SV 1 = { i 0 < α iyi < C i } 2. A support vector obtained from eq. (5) is caed a Type 2 support vector if α iyi = C i. This is represented as, SV 2 = { i α iyi = C i } The set of a support vectors are represented as, SV = SV 1 SV 2. Simiary, the set of support vectors for.o.o soution is given as SV t. Under definition (1) we have, Lemma 1. If in eave-one-out procedure a Type 1 support vector x t is cassified incorrecty, then we have, where, S 2 t = min β S t max( 2D, ( i,j 1 C ) 1 β i β j )K(x i, x j ) (6) s.t. α i β i C i ; {(i t, ) 0 < α i < C i ; = y i } α i β i 0; {(i t, ) α i < 0; y i } β i = 0; i / SV 1 {t} = 1... L β t = α t ; = 1... L β i = 0 S t := Span of the Type 1 support vector x t D := Diameter of the smaest hypersphere containing a training sampes. This eads to the foowing upper bound on the.o.o error. Theorem 1. The eave-one-out error is upper bounded as: R.o.o 1 n ( Ψ 1 + Ψ 2 ) (7) { Ψ 2 := t SV 1 T S t max( 1 } 2D, ) 1 C { } Ψ 1 := t SV 2 T ; := Cardinaity of a set and T := Training Set. Foowing Theorem 1, it is desirabe to seect a mode with a) ower number of Type 2 training support vectors and, b) smaer span for the type 1 training support vectors. Roughy, for a fixed number of type 2 support vectors a soution with smaer span vaue (for the type 1 training support vectors) coud yied ower test error. The foowing proposition shows how the universum sampes infuence these span vaues in (6).

5 Muticass Universum SVM [ Proposition 3. If the Type 1 training support vectors i.e. KSV1 I here, H := L A t SV 1 T for SVM and MU-SVM soutions remain same, A 0 then St SV M St MU SV M ; t SV 1 T. Loosey speaking, for cases where the type of training support vectors remain same, introducing universum sampes through the MU-SVM formuation coud resut in smaer span vaues and better generaization for future test data compared to standard SVM soution. Now, Theorem 1 provides an anaytic too for mode seection with sma.o.o error. Here, the right hand side of (7) serves as a eave-one-out error estimate, and the goa is to seect a mode parameter which minimizes this vaue. However, the practica utiity of (7) is imited due to the significant computationa compexity invoved in estimating the span of the type 1 training support vectors O(n + ml) 4 (worst case). Next, we provide a more computationay attractive aternative to the above.o.o bound. Assumption: For the MU-SVM soution, i The set of support vectors of the Type1 and Type2 categories remain the same during the eave-one-out procedure. ii The dua variabes of the Type1 support vectors have ony two active eements i.e. α i s.t. {0 < α iyi < C i } k y i s.t. α ik = α iyi. Lemma 2. Under the above assumptions the foowing equaity hods for both Type 1& 2 support vectors, S 2 t =[α t α i K(x i, x t ) (8) i SV α tyt g y tk i SV t αik(x t i, x t )] with, St 2 = {min β i,j( β i β j )K(x i, x j ) β t = α t ; β i = 0 ; (i, j) SV 1 } and g ytk = [0,... 1,..., 1,..., 0]; k = argmax α yt jq t K(x j, x t ) k th q y t j Now S t can be efficienty computed using emma (3). Lemma 3. Under Assumptions (i) & (ii) ] ; A := I SV1 (1 L ) ; 1 L = [ } 1 1 {{... 1 } ] L eements (H 1 ) tt := sub-matrix of H 1 for indices K SV1 i = (t 1)L tl := Kerne matrix of Type 1 support vectors. K t = [(k T t 1 L ) 0 L SV1 ] T where, k t = n SV1 1 dim vector where ith eement is K(x i, x t ), x i SV 1 ; and is the Kronecker product. Finay, we have, Theorem 2. Under the Assumptions (i) & (ii) the eave-oneout error is upper bounded as: R.o.o 1 n Ψ 3 (9) { } ] Ψ 3 = t SV T St 2 α t α i K(x i, x t ) i SV and T := Training Set ; and S t := defined in Lemma 3 Note that, simiar to (Vapnik & Chapee, 2000), the assumptions (i) & (ii) are not satisfied in most cases. Nevertheess, Theorem 2 provides a good approximation of the.o.o procedure (see Section 4.2.2). In addition, compared to Theorem 1, it provides the foowing advantages, Eq. (9) is vaid for both type 1 & 2 training support vectors and typicay resuts in a stricter bound. Span computation for a support vectors requires inverting the H - matrix ony once (Lemma 3). This resuts to an overa cost of O(n + ml) 3 for computing (9) and provides a computation edge over (7) which invoves a cost of O(n + ml) 4. Empirica resuts for mode seection using Theorem 2 are provided in Section Empirica Resuts 4.1. Datasets and Experimenta settings Our empirica resuts use three rea ife datasets : German Traffic Sign Recognition Benchmark (GTSRB) (Stakamp et a., 2012): The goa here is to identify the traffic signs for the speed-zones 30, 70 and 80. Here, the images are represented by their histogram of gradient (HOG 1) features. The experimenta setting is provided in Tabe 1. For this data we use three kinds of Universum: { St 2 α = t [(H 1 ) tt ] 1 α t t SV 1 T α t [K(x t, x t ) I L K T t H 1 K t ]α t t SV 2 T Random Averaging: syntheticay created by first seecting a random traffic sign from each cass ( 30, 70 and 80 ) in the training set and averaging them.

6 Muticass Universum SVM Tabe 1: Rea-ife datasets. DATASET TRAIN/TEST SIZE DIMENSION GTSRB ABCDETC ISOLET 300 / 1500 (100 / 500 PER CLASS) 600 / 400 (150 / 100 PER CLASS) 500 / 500 (100 / 100 PER CLASS) 1568 (HOG FEATURES) (100 X 100 PIXEL) 617 Non-Speed : a other non-speed zone traffic signs. Sign priority-road : An exhaustive search over severa non-speed zone traffic signs showed this universum to provide the best performance (Appendix B.3) Handwritten characters (ABCDETC) (Weston et a., 2006): The data consists images of handwritten digits 0-9, uppercase A-Z, owercase etters a-z and some additiona symbos:!?,. ; : = - + / / ( ) $ The goa here is to identify the handwritten digits 0-3 based on their pixe vaues. We use four different types of universum: Upper: A - Z, Lower: a - z, Symbos: a additiona symbos and Random Averaging (RA) obtained by randomy averaging the training sampes. Speech-based Isoated Letter Recognition (ISOLET) (Fanty & Coe, 1991): This is a speech recognition dataset where 150 subjects spoke the name of each etter a - z twice. The goa is to identify the spoken etters a - e using the spectra coefficients, contour features, sonorant features, presonorant features, and post-sonorant features. We use two different types of universum: Others, which consists of a other speech i.e. f - z and Random Averaging (RA). Note that, to simpify our anaysis (in Section 4.2.1) we used a subset of the training casses. However, simiar resuts can be expected using a the training casses (Appendix B.2). Our initia experiments suggest that inear parameterization is optima for the GTSRB dataset; hence ony inear kerne has been used for it. For the ABCDETC and ISOLET datasets an RBF kerne of the form K(x i, x j ) = exp( γ x i x j 2 ) with γ = 2 7 provided optima resuts for SVM. For a the experiments mode seection is done over the range of parameters, C = [10 4,..., 10 3 ], C /C = n ml and = [0, 0.01, 0.05, 0.1] using stratified 5-Fod cross vaidation Resuts COMPARISON BETWEEN SVM VS. MU-SVM Performance comparisons between SVM and MU-SVM for the different types of Universum are shown in Tabe 2. The tabe shows the average test error over 10 random Tabe 2: Mean (± standard deviation) of the test errors (in %) over 10 runs of the experimenta setting in Tabe 1. GTSRB SVM 7.54 ± 0.82 NO. OF UNIVERSUM SAMPLES MU-SVM PRIORITY ROAD 6.97 ± ± ± 0.78 RA 7.08 ± ± ± 0.43 NON ± ± ± 0.93 SPEED ABCDETC SVM 27.1 ± 3.5 UPPER 26.5 ± ± ± 4.0 LOWER 25 ± ± ± 3.1 SYMBOLS 23.5 ± ± ± 3.2 RA 23.2 ± ± ± 3.2 ISOLET SVM 3.6 ± 0.3 RA 3.05 ± ± ± 0.28 OTHERS 3.50 ± ± ± 0.3 training/test partitioning of the data in simiar proportions as shown in Tabe 1. As seen from Tabe 2, MU-SVM provides better generaization than SVM. In fact, for certain universum types, ike Priority-Road for GTSRB, RA for ABCDETC and ISOLET; MU-SVM significanty outperforms the muticass SVM mode. In such cases, the performance gains improve significanty upto 20 25% with the increase in number of universum sampes, and stagnates for a significanty arge universum set size. This indicates that for sufficienty arge universum set size the effectiveness of MU-SVM depends mosty on the type (statistica characteristics) of the universum data. For a better understanding of such statistica characteristics, we adopt the technique of histogram of projections originay introduced for binary cassification in (Cherkassky & Dhar, 2010). However, different from binary cassification, here we project a training sampe (x, y = k) onto the decision space for that cass i.e. w k x max k w x = 0 and the universum sampes onto the decision spaces of a the casses. Finay, we generate the histograms of the projection vaues for our anaysis. Further, in addition to the histograms, we aso generate the frequency pot of the predicted abes for the universum sampes. Figs. 3 shows the typica histograms and frequency pots for the SVM and MU-SVM modes for the GTSRB dataset using the priority-road sign (as universum). As seen from Fig. 3, the optima SVM mode has high separabiity for

7 Muticass Universum SVM Figure 3: Typica histogram of projection for training sampes (n = 300) (shown in bue) and universum sampes priorityroad (m = 500) (shown in red). SVM decision functions (with C = 1) for (a) sign 30. sign 70.(c) sign 80. (d) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.2, = 0.01) for (e) sign 30. (f) sign 70.(g) sign 80. (h) frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 4: Typica histogram of projection for training sampes (n = 300) (shown in bue) and universum sampes Random Averaging (m = 500) (shown in red). SVM decision functions (with C = 1) for (a) sign 30. sign 70.(c) sign 80. (d) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.2, = 0) for (e) sign 30. (f) sign 70.(g) sign 80. (h) frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 5: Typica histogram of projection for training sampes (n = 300) (shown in bue) and universum sampes Others (m = 500) (shown in red). SVM decision functions (with C = 1) for (a) sign 30. sign 70.(c) sign 80. (d) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.2, = 0.05) for (e) sign 30. (f) sign 70.(g) sign 80. (h) frequency pot of predicted abes for universum sampes using MU-SVM mode. the training sampes i.e., most of the training sampes ie outside the margin borders. In fact, simiar to binary SVM (Cherkassky & Dhar, 2010), we see data-piing effects for the training sampes near the +1 - margin borders of the decision functions for a the casses. This is typicay seen under high-dimensiona ow sampe size settings. However, the universum sampes ( priority-road ) are widey spread about the margin-borders. Moreover, here the universum sampes are biased towards the positive side of the decision boundary of the sign 30 (Fig. 3(a)) and hence predominanty gets cassified as sign 30 (Fig.3(d)). As seen from Figs 3. (e)-(h), appying the MU-SVM mode preserves the separabiity of the training sampes and additionay reduces the spread of the universum sampes. Such a mode exhibits uncertainty on the universum sampes cass membership, and uniformy assigns them over a the casses i.e. signs 30, 70 and 80 (Fig. 3(h)). This shows that, the resuting MU-SVM mode has higher contradiction (uncertainty) on the universum sampes and hence provides better generaization compared to SVM. Fig 4 shows the histograms and the frequency pots for SVM and MU-SVM modes for RA universum. As shown in Fig 4 (a), the SVM mode aready resuts in a narrow distribution of the universum sampes and in turn provides near random prediction on the universum sampes (Fig. 4(d)). Appying MU-SVM for this case provides no significant change to the muticass SVM soution and hence no additiona improvement in generaization (see Tabe 2 and Fig.4 (e)-(h)). Finay, we provide the histograms and the frequency pots for SVM and MU-SVM modes for the Non-Speed Univer-

8 Muticass Universum SVM Tabe 3: Performance comparisons for mode seection using cross vaidation vs. anaytic bound in Theorem 2. Train/Test partitioning foows Tabe 1. No. of universum sampes (m = 1000). Mode parameters used C /C = n ml, = [0, 0.01, 0.05, 0.1] GTSRB ABCDETC ISOLET MUSVM PRIORITY ROAD. 5-FOLD CV THEOREM 2 TEST ERROR (IN %) TIME TEST ERROR ( 10 4 sec) (IN %) TIME ( 10 4 sec) 5.5 ± ± ± ± 0.1 RA 6.9 ± ± ± ± 0.3 NON- SPEED 6.9 ± ± ± ± 0.5 UPPER 26.1 ± ± ± ± 0.1 LOWER 24.2 ± ± ± ± 0.1 SYMBOLS 23.3 ± ± ± ± 0.09 RA 22.1 ± ± ± ± 0.1 RA 2.8 ± ± ± ± 0.7 OTHERS 3.3 ± ± ± ± 0.5 sum sampes. In this case, athough the universum sampes are widey spread about the SVM margin-borders (Figs 5(a)- (c)), yet the uncertainity on the universum sampes cass membership is uniform across a the casses (Fig 5(d)). Appying MU-SVM reduces the spread of the universum sampes (Figs. 5(e) - (g)). However, it does not significanty increase the contradiction (uncertainity) on the universum sampes (compare Figs. 5 (d) vs. (h)). Hence, appying MU- SVM does not provide any significant improvement over the SVM mode (see Tabe 2). The histograms for the other datasets provide simiar insights and have been provided in Appendix B.4. This section shows that for high-dimensiona ow sampe size settings, MU-SVM provides better generaization than muticass SVM. Under such settings the training data exhibits arge data-piing effects near the margin border ( +1 ). For such i-posed settings, introducing the Universum can provide improved generaization over the muticass SVM soution. However, the effectiveness of the MU-SVM aso depends on the properties of the universum data. Such statistica characteristics of the training and universum sampes for the effectiveness of MU-SVM can be convenienty captured using the histogram-of-projections method introduced in this paper EFFECTIVENESS USING ANALYTIC BOUND IN THEOREM 2 Next we iustrate the practica utiity of the bound in Theorem 2 for mode seection. First, we provide a comparison between the error estimates using 5-Fod cross vaidation (CV) vs. Theorem 2 3. For iustration we use the GTSRB dataset under the experimenta setting provided in Tabe 1. Fig. 6 (a) shows the average error estimates using 5-Fod CV and Theorem 2 as we as the true test error for the MU-SVM mode using priority-road over the range of parameters C /C = [10 3, 10 2, 10 1, 10 0 ] with fixed = 0. The resuts are obtained over 10 random partitioning of the training/test dataset. Fig. 6 (a) shows that the error estimates using Theorem 2 foows a very simiar pattern as 5-Fod CV and test error. This shows that the mode parameter C /C = 10 1 that minimizes the.o.o error estimate in Theorem 2, aso minimizes the test error and 5 Fod CV. Hence, Theorem 2 provides a practica aternative to mode seection using resamping techniques. Throughout our resuts we observe that the error estimates using Theorem 2 are uniformy ower than the 5-Fod CV and test error. This can be attributed to two main reasons. First, for high-dimensiona ow sampe size settings, majority of the training sampes ie outside the margin borders (see Figs. 3-5). This resuts in a significanty ow proportion of training SVs, and hence ow.o.o error in genera. Secondy, Theorem 2 hods under additiona assumptions (i) & (ii), and is further constrained compared to Theorem 1. Hence, Theorem 2 is an under estimator of the oo bound in Theorem 1. Of course, for the purpose of mode seection we are ony interested in the pattern, rather than the scae of the error estimates. Hence, such a difference in scae wi not impact the mode seection. However, to further simpify our iustrations, we aso provide a scae invariant ranking curve of the mode parameters in Fig. 6. The figure shows the average rankings of the mode parameters based on the error estimate vaues over each experiments. Here, for each experiment we rank the mode parameter with the smaest error estimate as 1, and the parameter with the argest estimate as 4, and average these rank vaues over the 10 experiments. The parameter with the smaest rank vaue 1 (in Fig. 6) is typicay seected through the mode seection strategy. Finay, as seen from 3 Note that, Theorem 2 approximates Theorem 1 to provide an upper bound on the.o.o error. Hence, a good comparison woud be between Theorem 2 vs. Theorem 1 and.o.o error. However, resuts using.o.o and Theorem 1 were prohibitivey sow and hence coud not be reported in this paper. As an aternative, we compare the error estimates from Theorem 2 with 5-Fod cross vaidation (CV) and test error. The objective is to iustrate that simiar to 5-Fod CV, using Theorem 2 we can obtain the optima mode parameters providing smaest test error.

9 Muticass Universum SVM (a) Figure 6: Performance of MU-SVM with priority-road universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters C /C = [10 3, 10 2, 10 1, 10 0 ], C = 1, = 0. Ranking of the mode parameters based on the error estimate vaues over each experiments. (a) Figure 7: Performance of MU-SVM with priority-road universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters n ml = [0, 0.01, 0.05, 0.1], C = 1, C /C = = 0.1 Ranking of the mode parameters based on the error estimate vaues over each experiments. Figs. 6 (a) -, athough different in scae, the error estimates using Theorem 2 correcty captures the pattern of the test error and seects the mode parameter with the smaest test error (i.e. C /C = 10 1 ). A simiar comparison over the range of parameters = [0.001, 0.01, 0.1, 1] with n ml fixed C /C = = 0.3 is aso provided in Fig. 7. Here, compared to 5 - Fod CV, Theorem 2 correcty seects the optima parameter = 0.01 with the smaest test error (Fig. 7 ). As seen from Figs. 6 and 7, the mode parameters minimizing the error estimates in Theorem 2 aso minimizes the true test error. This can be aso seen for a other datasets in Tabe 1 (Appendix B.5). Hence, Theorem 2 provides a practica aternative to resamping techniques for mode seection. This is further confirmed from the resuts in Tabe 3. Tabe 3 shows the average test error over 10 random training/test partitioning of the data in simiar proportions as shown in Tabe 1. Here, the MU-SVM modes seected using Theorem 2 provides simiar generaization error compared to the modes seected through 5-Fod CV. Further, the proposed mode seection strategy using Theorem 2 invoves an O(n + ml) 3 operation, and provides a computationa edge over standard resamping techniques. Tabe 3 provides the average time (in seconds) for the MU-SVM mode seection using Theorem 2 vs. 5-fod CV for 10 runs over the entire range of parameters. The experiments were run on a desktop with 12 core Inte Ghz and 32 GB RAM. As seen from Tabe 3, the bound based mode seection is 2-4 times faster than the standard 5-fod resamping technique. 5. Concusions We introduced a new universum-based formuation for muticass SVM (MU-SVM). The proposed formuation embodies severa usefu mathematica properties amenabe to: a) an efficient impementation of the MU-SVM formuation using existing muticass SVM sovers, and b) deriving practica anaytic bounds for mode seection. We empiricay demonstrated the effectiveness of the proposed formuation as we as the bound on rea-word datasets. In addition, we aso provided insights into the underying behavior of universum earning and its dependence on the choice of universum sampes using the proposed histogram-of-projections method. References Chen, Shuo and Zhang, Changshui. Seecting informative universum sampe for semi-supervised earning. In IJCAI, pp , Chen, Xiaohong, Yin, Hujun, Hu, Mengei, and Wang, Liping. Universum Discriminant Canonica Correation Anaysis, pp Springer Internationa Pubishing, Cham, ISBN doi: / URL org/ / _61. Cherkassky, Vadimir and Dhar, Sauptik. Simpe method for interpretation of high-dimensiona noninear svm cassification modes. In Stahbock, Robert, Crone, Sven F., Abou-Nasr, Mahmoud, Arabnia, Hamid R., Kourentzes, Nikoaos, Lenca, Phiippe, Lippe, Wofram-Manfred, and Weiss, Gary M. (eds.), DMIN, pp CSREA Press, ISBN Cherkassky, Vadimir and Muier, Fiip M. Learning from Data: Concepts, Theory, and Methods. Wiey-IEEE Press, ISBN Cherkassky, Vadimir, Dhar, Sauptik, and Dai, Wuyang. Practica conditions for effectiveness of the universum

10 Muticass Universum SVM earning. Neura Networks, IEEE Transactions on, 22(8): , Crammer, Koby and Singer, Yoram. On the earnabiity and design of output codes for muticass probems. Machine earning, 47(2-3): , Dhar, Sauptik and Cherkassky, Vadimir. Deveopment and evauation of cost-sensitive universum-svm. Cybernetics, IEEE Transactions on, 45(4): , Dhar, Sauptik and Cherkassky, Vadimir. Universum earning for svm regression. In Neura Networks (IJCNN), 2017 Internationa Joint Conference on, pp IEEE, Dhar, Sauptik, Ramakrishnan, Naveen, Cherkassky, Vadimir, and Shah, Mohak. On muticass universum earning. Dhar, Sauptik, Ramakrishnan, Naveen, Cherkassky, Vadimir, and Shah, Mohak. Universum earning for muticass svm. arxiv preprint arxiv: , Dietterich, Thomas G and Bakiri, Ghuum. Soving muticass earning probems via error-correcting output codes. Journa of artificia inteigence research, 2: , Fanty, Mark and Coe, Ronad. Spoken etter recognition. In Advances in Neura Information Processing Systems, pp , Goodfeow, Ian, Bengio, Yoshua, and Courvie, Aaron. Deep Learning. MIT Press, deepearningbook.org. Hsu, ChihWei and Lin, ChihJen. A comparison of methods for muticass support vector machines. Neura Networks, IEEE Transactions on, 13(2): , Japkowicz, Nathaie and Shah, Mohak. Evauating earning agorithms: a cassification perspective. Cambridge University Press, Lu, Shuxia and Tong, Le. Weighted twin support vector machine with universum. Advances in Computer Science: an Internationa Journa, 3(2):17 23, Luntz, Aeksandr. On estimation of characters obtained in statistica procedure of recognition. Technicheskaya Kibernetica, Patt, John C, Cristianini, Neo, and Shawe-Tayor, John. Large margin dags for muticass cassification. In NIPS, voume 12, pp , Qi, Zhiquan, Tian, Yingjie, and Shi, Yong. A nonparae support vector machine for a cassification probem with universum earning. Journa of Computationa and Appied Mathematics, 263: , Schokopf, Bernhard and Smoa, Aexander J. Learning with kernes: support vector machines, reguarization, optimization, and beyond. MIT press, Shen, Chunhua, Wang, Peng, Shen, Fumin, and Wang, Hanzi. Uboost: Boosting with the universum. Pattern Anaysis and Machine Inteigence, IEEE Transactions on, 34(4): , Sinz, F. A priori knowedge from non-exampes. PhD thesis, Mar Sinz, FH., Chapee, O., Agarwa, A., and Schökopf, B. An anaysis of inference with the universum. In Advances in neura information processing systems 20, pp , NY, USA, September Curran. Stakamp, J., Schipsing, M., Samen, J., and Ige, C. Man vs. computer: Benchmarking machine earning agorithms for traffic sign recognition. Neura Networks, pp., ISSN doi: /j.neunet Vapnik, V. Estimation of Dependences Based on Empirica Data (Information Science and Statistics). Springer, March ISBN Vapnik, Vadimir and Chapee, Oivier. Bounds on error expectation for support vector machines. Neura computation, 12(9): , Vapnik, Vadimir N. Statistica Learning Theory. Wiey- Interscience, Wang, Zhe and Xue, Xiangyang. Muti-cass support vector machine. In Support Vector Machines Appications, pp Springer, Wang, Zhe, Zhu, Yujin, Liu, Wenwen, Chen, Zhihua, and Gao, Daqi. Muti-view earning with universum. Knowedge-Based Systems, 70: , Weston, Jason and Watkins, Chris. Muti-cass support vector machines. Technica report, Citeseer, Weston, Jason, Coobert, Ronan, Sinz, Fabian, Bottou, Léon, and Vapnik, Vadimir. Inference with the universum. In Proceedings of the 23rd internationa conference on Machine earning, pp ACM, Xu, Yitian, Chen, Mei, and Li, Guohui. Least squares twin support vector machine with universum data for cassification. Internationa Journa of Systems Science, pp. 1 9, 2015.

11 Xu, Yitian, Chen, Mei, Yang, Zhiji, and Li, Guohui. ν- twin support vector machine with universum data for cassification. Appied Inteigence, 44(4): , Zhang, Dan, Wang, Jingdong, Wang, Fei, and Zhang, Changshui. Semi-supervised cassification with universum. In SDM, pp SIAM, ISBN Zhang, Xiang and LeCun, Yann. Universum prescription: Reguarization using unabeed data. In AAAI, pp , Zhu, Changming. Improved muti-kerne cassification machine with nyström approximation technique and universum data. Neurocomputing, 175: , Muticass Universum SVM

12 Appendix arxiv: v1 [cs.lg] 23 Aug 2018 Contents A A Proofs A.1 Proof of Proposition 1 A.2 Proof of Proposition 2 A.3 Derivation of Agorithm 1 A.4 Proof of Lemma 1 A.5 Proof of Theorem 1 A.6 Proof of Proposition 3 A.7 Proof of Lemma 2 A.8 Proof of Lemma 3 A.9 Proof of Theorem 2 B Additiona Resuts B.1 ECOC vs. Direct Approach B.2 SVM vs. MU-SVM using a training casses B.3 Performance comparisons for severa Universum types with varying Training set size for GTSRB dataset B.4 Additiona Histogram of Projections B.5 Comparison of the error estimates using 5-Fod CV vs. Theorem 2 Proofs The references cited in this document foows the numbering used in the main paper. A.1 Proof of Proposition 1 Such a proposition is avaiabe for muticass SVMs (Crammer & Singer, 2002). Here, we provide a proof for the MU-SVM formuation. Formuation (2) for binary cassification becomes, min w 1,w 2,ξ,ζ 1 2 ( w w 2 2 2) + C n ξ i + C i=1 m i =1 s.t. (w yi w ) x i e i ξ i ; e i = 1 δ i, = 1, 2 (wk x i max =1,2 w x i ) + ζ i k; ζ i k, k = 1, 2 { i = 1... n, i 1; yi = = 1... m, δ i = 0; y i The constraints become, Training sampes ( i = 1... n) (ζ i 1 + ζ i 2) (10)

13 For any x i cass 1 abeed as y i = +1; we have (w 1 w 1 ) x i ξ i ξ i 0 (w 1 w 2 ) x i 1 ξ i y i (w 1 w 2 ) x i 1 ξ i Simiary, for any x i cass 2 abeed as y i = 1; we have, (w 2 w 1 ) x i 1 ξ i y i (w 1 w 2 ) x i 1 ξ i (w 2 w 2 ) x i ξ i ξ i 0 Universum sampes ( i = 1... m) For any universum sampe x i WLOG we assume w 1x i w 2x i. Then, When k = 1 we have w1 x i max =1,2 w x i + ζ i k ζ i k (true ζ i k 0). When k = 2 we have w 2 x i max =1,2 w x i +ζ i k w 2 x i w 1 x i +ζ i k, ζ i k 0. Hence, eq. (10) can be re-written as, min w 1,w 2,ξ,ζ 1 2 ( w w 2 2 2) + C n ξ i + C i=1 m i =1 s.t. y i (w 1 w 2 ) x i 1 ξ i ; ξ i 0, i = 1... n (w 1 w 2 ) x i + ζ i ; ζ i 0, i = 1... m ζ i (11) The soution to the KKT system of (11) satisfies w 1 = w 2. Hence repacing w = w 1 w 2 in (11) sti soves (10). This is the U-SVM formuation in (Weston et. a, 2006) with b = 0. A.2 Proof of Proposition 2 The contribution due to the universum sampes are same for both (2) and (3). For any universum sampe (x i ) we identify the active constraints and its overa contribution to the objective function through sack variabes i.e. Equation (2), the overa contribution of the universum sampe x i is, C L k=1 ζ i k s.t. w k x i max =1...L w x i + ζ i k, ζ i k 0, k = 1... L Case 1: If k = argmax w x i. The constraint is inactive and ζ i k = 0. =1...L Case 2: Let k argmax w x i. Since, ζ i k 0 the constraint is active if, (wk x i =1...L max k w x i ) >. Then, ζ i k = [ + (wk x i max k w x i )]. Hence, keeping ony the active constraints the overa contribution of the sampe x i is, C k K i [ + wk x i max k w x i ] where, K i = {k (w k x i max k w x i ) > } Equation (3), Foowing eq. (3) for the universum sampe x i we have L artificia sampes as (x i, y i = 1),..., (x i, y i = L) stacked at indices i = n + (i 1)L n + i L. Hence for x i we have the overa contribution as, (12) C n+i L i=n+(i 1)L+1 ξ i s.t. (w yi w ) (1 δ i ) ξ i

14 Now, for i = n + (i 1) + k, we have x i = x i, y i = k. The constraints are, (w k w 1 ) x i ξ i (w k w 1 ) x i ξ i.. (w k w k ) x i ξ i (inactive but ensures) ξ i 0.. (w k w L ) x i ξ i (w k w L ) x i ξ i This is equivaent to, (wk x i max k w x i ) + ξ i. Since, ξ i 0 the constraint is active if, (wk x i max k w x i ) >, and the contribution becomes, ξ i = [ + wk x i max k w x i ]. Combining a contributions we get, C n+i L i=n+(i 1)L+1 = C k K i ξ i s.t. (w yi w ) (1 δ i ) ξ i [ + wk x i max k w x i ] where, K i = {k (w k x i max k w x i ) > } (13) Comparing (12) and (13), the universum sampe has simiar contribution for both the objective functions in (2) and (4). This is vaid for a universum sampes. A.3 Derivation of Agorithm 1 In this section we provide the KKT system for (4) and the derivation for the dua form in (5). The proof is avaiabe in (Crammer & Singer, 2002), (Hsu & Lin, 2002a). We reproduce it for competeness and for better readabiity of the subsequent proofs. The Lagrangian of the MU-SVM formuation is given as, Lagrangian, L = 1 w KKT System w L = 0 w = i n+ml i=1 C i ξ i i η i [(w yi w ) T x i e i + ξ i ] (14) (C i δ i η i )x i (15) ξi L = 0 η i = C i Compimentary Sackness η i [(w yi w ) T x i e i + ξ i ] = 0 (i, ) Constraints, (w yi w ) T x i e i + ξ i (i, ) η i 0 Finay the dua probem is, max 1 (C i δ i η i )(C j δ j η j )K(x i, x j ) + η 2 i,j i, s.t. η i = C i η i 0 Setting α i = C i δ i η i we get (5). η i e i (16)

15 A.4 Proof of Lemma 1 First we prove some interesting properties specific to the MU-SVM soution. Lemma A.1. α i SV 1 = {i 0 < α i < C i ; y i = }, i. α ik [ α jk K(x i, x j ) + e ik ] = 0 ; k = 1... L k j ii. k y i with α ik < 0 (strict); α jk K(x i, x j ) + e ik = α jyi K(x i, x j ) + e iyi i.e. j j the projection vaues for the type 1 support vectors for such casses are equa. iii. For any γ i {γ i γ ik = 0; γ ik = 0 if α i SV 1 and α ik = 0} we have k γ ik [ α jk K(x i, x j ) + e ik ] = 0 j k Proof For simpicity we provide the proof for inear kerne. The same proof appies for non-inear transformations. The proof uses the KKT system for (4).(Appendix A.3) i. η ik (w yi w k ) T x i [From (15)] k = k η ik ( δ i w ) T x i k η ik w T k x i = C i δ i w T x i η ik wk T x i = (C i δ ik η ik )wk T x i k k = α ik α jk K(x i, x j ) k j From compimentary sackness, if α i < C i with y i = η i = (C i δ i α i ) > 0. This gives, (w yi= w k= ) T x i e ik= + ξ i = 0 ξ i = 0 ( i.e. ies on margin). Now, from compimentary sackness in (15), η ik [(w yi w k ) T x i e ik ] = 0 k k α ik [ j α jk K(x i, x j ) + e ik ] = 0 [ η ik e ik = (C i δ ik α ik )e ik = α ik e ik ] ii. From compimentary sackness (15) η ik [(w yi w k ) T x i e ik ] = 0 ( k y i ; α ik < 0, ξ i = 0) (w yi w k ) T x i e ik = 0 ( η ik > 0) w T y i x i = w T k x i + e ik j α jyi K(x i, x j ) + e iyi = j α jk yi K(x i, x j ) + e ik yi iii. For any such γ i, γ ik [ α jk K(x i, x j ) + e ik ] k j =γ iyi α jyi K(x i, x j ) + α jk yi K(x i, x j ) + e ik yi ] j =(γ iyi + k y i γ iyi )[ j γ ik [ k y i,α ik <0 j α jyi K(x i, x j )] (from ii above and e iyi = 1 δ iyi = 0) =0 ( k γ ik = 0 by construction)

16 With the above properties for the MU-SVM soution we provide the proof for Lemma 1 foowing simiar ines as in (Vapnik & Chapee, 2000). We restate the emma here for better readabiity. Lemma 1. If in eave-one-out procedure a Type 1 (training) support vector x t recognized incorrecty, then we have, SV 1 T is S t max( 2D, 1 C ) > 1 where, St 2 = min ( β i β j )K(x i, x j ) β i,j s.t. α i β i C i ; {(i t, ) α i < C i ; = y i } α i β i 0; {(i t, ) α i > 0; y i } β i = 0; (i, ) / SV 1 {t} β t = α t ; β i = 0 D = Diameter of the smaest hypersphere containing a training sampes, and T = Training set Proof The eave-one-out formuation for MU-SVM with the t T sampe dropped is, max α s.t. W (α) = 1 α i α j K(x i, x j ) α i e i 2 i,j i, α i = 0 (17) α i C i if = y i ; α i 0 if y i α t = 0; (additiona constraint) Then, the eave-one-out (.o.o) error is given as: R.o.o = 1 n n 1[y t ŷ t ] where, α t = [α11, t..., α1l t,..., αt1 t = 0,..., αtl t = 0,...] is the soution for (17) and }{{}}{{} α t 1 α t t =0 = arg max αi t K(x i, x t ) (estimated cass abe for the t th sampe). The overa proof ŷ t i for the bound on the.o.o error foows three major steps. First, we construct a feasibe soution for (5) using the optima eave-one-out soution α t. i.e., construct α t + γ as shown beow, t=1 α t i + γ i C i ; (i, ) {(i, ) 0 < α t i < C i ; = y i } := A t 1 αi t + γ i 0; (i, ) {(i, ) αi t < 0; y i } := A t 2 γ i = 0; (i, ) / SV1 t [SV1 t = A t 1 A t 2] γ i = 0; (18)

17 Now, I 1 = W (α t + γ) W (α t ) = 1 (αi t + γ i )(αj t + γ j )K(x i, x j ) (αi t + γ i )e i + 1 α 2 2 iα t jk(x t i, x j ) + i,j i i,j i = 1 ( γ i γ j )K(x i, x j ) ( γ i α 2 j)k(x t i, x j ) γ i e i i,j i,j i = 1 ( γ i γ j )K(x i, x j ) γ i [ α 2 jk(x t i, x j ) + e i ] i,j i, j = 1 ( γ i γ j )K(x i, x j ) γ t [ α 2 jk(x t j, x t ) + e t ] (Lemma A.1 (iii)) (19) i,j j α t ie i As a specia case we set, γ t = [... a yt,..., a k th,...] = ag ytk; (k = argmax q y t j α t jqk(x j, x t ) ; g ytk = [... 1 yt... 1 k th ]) Further, we seect another p SV 1 where γ p t = ag ytk. Finay, we set, γ i = 0 i / {t, p}. For such a case, I 1 = a 2 x t x p 2 + a[1 ( j α t jy t K(x j, x t ) j α t jkk(x j, x t ))] â 2 D 2 + â[1 ( j α t jy t K(x j, x t ) j α t jkk(x j, x t ))] (20) with, â = 1 2D [1 ( α 2 jy t t K(x j, x t ) αjk t K(x j, x t ))] (the vaue that maximizes the R.H.S in j j (20)) and D = Diameter of the smaest hypersphere containing a training sampes. Now, if; â C I 1 1 4D 2 [1 ( j ese, I 1 C 2 D 2 + C[1 ( j α t jy t K(x j, x t ) j α t jy t K(x j, x t ) j α t jkk(x j, x t ))] = 1 2â α t jkk(x j, x t ))] = 2CD 2 [â C 2 ] 2CD2 â 2 If there is an error due to eave one out procedure, then max q y t j α t jm K(x j, x t ) > j α t jy t K(x j, x t ). This gives, I 1 > 1 2 min(c, 1 ) (for.o.o error) (21) 2D2 Second, we construct a feasibe soution for the eave-one-out formuation (17) using the optima soution for (5). i.e., construct α β as shown beow, α i β i C i ; (i, ) A 1 {t}; A 1 = {(i, ) 0 < α i < C i ; = y i } α i β i 0; (i, ) A 2 {t}; A 2 = {(i, ) α i < 0; y i } βi = 0; β i = 0 (i, ) / SV 1 {t} (22) β t = α t

18 with SV 1 = A 1 A 2 = {i 0 < α iyi < C i } such that, it is a feasibe soution for (17). As before, define I 2 = W (α) W (α β) = 1 α i α j K(x i, x j ) α i e i + 1 (α i β i )(α j β j )K(x i, x j ) 2 2 i,j k i i,j + (α i β i )e i i = 1 ( 2 i,j = 1 ( 2 i,j β i β j )K(x i, x j ) i β i [ j α j K(x j, x i ) + e i ] β i β j )K(x i, x j ) (Lemma A.1 (iii)) (23) Third, as the fina step define, St 2 = min ( β i β j )K(x i, x j ) (24) β i,j s.t. α i β i C i ; (i, ) A 1 {t} α i β i 0; (i, ) A 2 {t} β i = 0; (i, ) / SV 1 {t} β t = α t ; β i = 0 Now, et β be the minimizer for (24). For such a β I 2 (= 1 2 S2 t ) I 1 [ W (α) W (α + γ) γ; W (α β) W (α) β] > 1 2 min(c, 1 2D 2 ) (from(21)) A.5 Proof of Theorem 1 Theorem 1. The eave-one-out error is upper bounded as: R.o.o 1 n ( Ψ 1 + Ψ 2 ) (25) { Ψ 2 := t SV 1 T S t max( 1 } 2D, ) 1 C { } Ψ 1 := t SV 2 T ; := Cardinaity of a set where T := Training Set. Proof The proof depends on the contribution of a sampe to the eave-one-out error, First, for a sampe (x t, y t ) which is not a support vector, i.e. t / SV and t T (Training set); it ies outside margin borders. Dropping such a sampe does not change the origina soution of (5). Hence, it does not contribute to an error. Secondy, for a sampe (x t, y t ) SV 1 T contributing to eave-one-out error, Lemma 1 hods i.e. S t max( 1 2D, C ) > 1. Finay, for a sampe (x t, y t ) with t SV 2 T we add to the eave-one-out error.

19 A.6 Proof of Proposition 3 Remark 2. If the Type 1 training support vectors i.e. {t t SV 1 T } for SVM and MU-SVM soutions remain same, then we have St SV M St MU SV M. Proof By definition in Lemma 1, St 2 =min ( β i β j )K(x i, x j ) β i,j s.t. α i β i C i ; α i β i 0; β MU SV M := β i = 0; β t = α t ; β i = 0 (i, ) A 1 {t} (i, ) A 2 {t} (i, ) / SV 1 {t} If the Type 1 (training) support vectors for SVM and MU-SVM soutions remain same, we get the same reation as Lemma 1 for C&S SVM with, β SV M = {β i β MU SV M β i = α i ; i SV 1 U} i.e. β SV M β MU SV M S t (β SV M ) S t (β MU SV M ), where U = Universum sampes. A.7 Proof of Lemma 2 Lemma 2. Under the assumptions (i) and (ii) in Section 3.3 the foowing equaity hods for both Type 1& 2 training support vectors, i.e. x t SV T St 2 =[α t α i K(x i, x t ) α tyt gy tk αik(x t i, x t )] i SV i SV t with, St 2 = {min β i,j( β i β j )K(x i, x j ) β t = α t ; [0,... 1,..., 1,..., 0]; k = argmax α yt jq t K(x j, x t ) k th q y t j Proof β i = 0 ; (i, j) SV 1 } and g ytk = Under the Assumption (i) we set β = γ = (α α t ). Then I 1 = W (α) W (α t ) = I 2 A simiar anaysis as in (19) gives, I 1 = 1 γ i γ j )K(x i, x j ) α t [ α 2 jk(x t j, x t ) + e t ] (26) j SV (i,j) SV 1 ( Note the difference in form compared to (19). This is because now the anaysis appies for both type 1& 2 support vectors. Simiary, I 2 = 1 ( β i β j )K(x i, x j ) α t [ α j K(x j, x t ) + e t ] (27) 2 i,j j SV Combining, (26) and (27) β i β j K(x i, x j ) = (i,j) SV 1 α t [ j SV α j K(x j, x t ) + e t ] α t [ αjk(x t j, x t ) + e t ] j SV (28)

20 Next, et β be the minimizer for (24). Then, (α β ) is a feasibe soution for (17). Hence, W (α t ) W (α β ) W (α) W (α t ) W (α) W (α β ) i,j ( β i β j )K(x i, x j ) S 2 t However, from Assumption (i), β = (α α t ) is a feasibe soution for (24). Hence for such a β we have : St 2 ( β i β j )K(x i, x j ). Combining the above inequaity, i,j S 2 t = i,j ( β i β j )K(x i, x j ) (29) Further, under Assumption (i) the inequaity constraints in (24) are not activated. Hence, St 2 = {min β i,j( β i β j )K(x i, x j ) β t = α t ; β i = 0 ; (i, j) SV 1 }. Finay combining (28) and (29) we get, St 2 = α t [ α j K(x j, x t ) + e t ] α t [ αjk(x t j, x t ) + e t ] (30) j SV j SV For eave one out error (under Assumption (ii)), α t [ αjk(x t j, x t )] = α tyt [ αjkk(x t j, x t ) αjy t t K(x j, x t )] j SV j SV j SV 0 (k = argmax αjmk(x t j, x t )) m y t S 2 t α t [ j SV α j K(x j, x t )] j SV A.8 Proof of Lemma 3 Lemma 3. The span St 2 can be efficienty computed as { St 2 α = t [(H 1 ) tt ] 1 α t t SV 1 T α t [K(x t, x t ) I L K T t H 1 K t ]α t t SV 2 T [ ] KSV1 I here, H := L A ; A := I A 0 SV1 (1 L ) ; 1 L = [ } 1 1 {{... 1 } ] L eements (H 1 ) tt := sub-matrix of H 1 for indices i = (t 1)L tl K SV1 := Kerne matrix of Type 1 support vectors. and K t = [(k T t 1 L ) 0 L SV1 ] T where, k t = n SV1 1 dim vector where i th eement is K(x i, x t ), x i SV 1 ; and is the Kronecker product. Proof The Span is defined as: St 2 = min ( β i β j )K(x i, x j ) (31) β i,j s.t. β t = α t ; = 1,..., L β i = 0 ; (i, j) SV 1

21 Case(t SV 1 ) = min (α t α t )K(x t, x t ) + 2 β s.t. (I SV1 {t} 1 L ) β = 0 }{{} A = min β i SV 1 {t} max µ α t [K(x t, x t ) I L ]α t + 2 α t β i K(x t, x i ) + i SV 1 {t} α t β i K(x t, x i ) + + 2µ Aβ + 2α T A tt µ (µ := Lagrange Mutipier, = α t [K(x t, x t ) I L ]α t + min β max µ 2α t (H ( t) t ) λ + λh ( t) λ }{{} L(λ) (i,j) SV 1 {t}( (i,j) SV 1 {t}( β i β j )K(x i, x j ) α t = 0 α T A tt µ = 0) (with λ = [β; µ]) where, I SV1 {t} := Identity Matrix of size SV 1 {t}, A tt := submatrix of A for indices(t 1)L + 1,..., tl H ( t) := (t 1)L + 1,..., tl rows/coumns of matrix H (in Lemma??) removed; and H ( t) t := (t 1)L + 1,..., tl coumns of H. Further, at sadde point : λ L(λ) = 0 λ = [H ( t) ] 1 H ( t) t α t. Hence, S 2 t = α t [(K(x t, x t ) I L ) (H ( t) t ) (H ( t) ) 1 H ( t) t ]α t = α t (H 1 ) tt α t (32) where, (H 1 ) tt := sub-matrix of H 1 for index i = (t 1)L + 1,..., tl. Case (t SV 2 ) A simiar anaysis as above gives, S 2 t = α t [K(x t, x t ) I L K T t H 1 K t ]α t (33) where, K t = [(k T t 1 L ) 0 L SV1 ] T and k t = n SV1 1 dim vector where ith eement is K(x i, x t ), x i SV 1. β i β j )K(x i, x j ) A.9 Proof of Theorem 2 The proof has two steps. First, a sampe (x t, y t ) which is not a support vector does not contribute to an error. Secondy, for a sampe (x t, y t ) with t SV T Theorem 2 hods. Finay, combining the form of S 2 t in Lemma 3 competes the proof. B Additiona Resuts B.1 ECOC vs. Direct Approach for MU-SVM This section provides the performance comparisons between two major ECOC based approaches:- one-vs-a (OVA) and one-vs-one (OVO) vs. the direct formuation (C & S based MU-SVM in (2)). For the ECOC based approaches we use standard U-SVM formuation (in Weston et. a 2006) to sove the binary probems. Further, we use the same datasets and experimenta settings as discussed in Section 4. For a the datasets we show the resuts for Universum types which provided the best performance in Tabe. 2. As shown above, for the datasets and experimenta settings used in this paper, the C&S based direct formuation (MU-SVM) performs as good as (or better) than the ensembe based methods.

22 Tabe 4: Mean (± standard deviation) test error in % over 10 runs. DATA SET METHOD ONE VS ALL ONE VS ALL C&S (MU-SVM) GTSRB ABCDETC ISOLET SVM 7.07 ± ± ± 1.16 U-SVM (PRIORITY-ROAD) 6.05 ± ± ± 0.32 SVM 28.1 ± ± ± 3.34 U-SVM (RA) ± ± ± 2.89 SVM 3.72 ± ± ± 0.31 U-SVM (RA) 3.56 ± ± ± 0.32 B.2 SVM vs MU-SVM using a training casses Tabe 5: Performance comparisons between SVM vs. MU-SVM using a training casses. DATASETS GTSRB # TRAIN / TEST = 700 / 3500 (100 / 500 PER CLASS), # UNIVERSUM (M) = 500 MU-SVM (PRIORITY-ROAD) MU-SVM (RA) MU-SVM (NON-SPEED) SVM = ± ± ± ± # TRAIN / TEST = 1500 / 1000 (150 / 100 PER CLASS), # UNIVERSUM (M) = 300 ABCDETC UPPER LOWER SYMBOLS RA SVM = 42.1 ± ± ± ± ± 2.1 -

23 B.3 Performance comparisons for severa Universum types with varying Training set size for GTSRB dataset The experiments foow the same setting as in Tabe 2. However in this case we vary the number of training sampes. The universum set size is fixed to m = 500 foowing Tabe 2 i.e. Further, increase in universum sampes does not provide significant performance gains. Tabe 6 provides the mean and std. deviation of the test errors for the SVM and MU-SVM modes over 10 random training/test partitioning of the dataset. Tabe 6: Mean (± standard deviation) of the test errors (in %) over 10 runs for the GTSRB dataset. NO. OF TRAINING SAMPLES (PER CLASS) METHODS 300 (100) 750 (250) 1500 (500) C&S SVM 7.54 ± ± ± 0.38 (NO PASSING) (NO PASSING FOR TRUCKS) 6.98 ± ± ± ± ± ± 0.41 MU-SVM NO. OF UNIVERSUM SAMSPLES = 500 (RIGHT OF WAY) (PRIORITY ROAD) (YIELD RIGHT OF WAY) (STOP) (NO VEHICLES) 6.17 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.24 (NO ENTRY) (DANGER) 6.17 ± ± ± ± ± ± ± ± ± 0.62 (SLIPPERY ROAD) RA 6.98 ± ± ± 0.54 NON SPEED 7.46 ± ± ± 0.4

24 Figure 9: Typica histogram of projection of training sampes (n = 750) (shown in bue) and universum sampes priority-road (m = 500) (shown in red). SVM decision functions (with C = 0.1) for (a) sign 30. sign 70.(c) sign 80. (d) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.5, = 0.1) for (e) sign 30. (f) sign 70.(g) sign 80. (h) frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 10: Typica histogram of projection of training sampes (n = 1500) (shown in bue) and universum sampes priority-road (m = 500) (shown in red). SVM decision functions (with C = 0.1) for (a) sign 30. sign 70.(c) sign 80. (d) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 1, = 0.05) for (e) sign 30. (f) sign 70.(g) sign 80. (h) frequency pot of predicted abes for universum sampes using MU-SVM mode. Tabe 6 shows that MU-SVM with priority-road universum provides the best performance. Further, the performance gains due to MU-SVM reduces with the increase in the number of training sampes. For further anaysis of this resut we use the histogram of projections method. The histogram of projections for the priority-road universum with increased training sampes n = 750, 1500 are provided in Figs. 9 and 10 respectivey. As seen from the figures when the number of training sampes is arge, the estimation probem becomes we-posed and SVM mode does not exhibit a huge data-piing effect about the +1 margin borders (compared to Fig. 3). In such cases, appication of MU-SVM does not provide a significant improvement over the SVM soution. This is consistent with the resuts reported in (Cherkassky et a., 2011) for binary U-SVM. This shows that MU-SVM is typicay effective for (i-conditioned) high dimension ow sampe size settings.

25 B.4 Additiona Histogram of Projections This section provides the histogram of projections on the modeing resuts for the ABCDETC and ISOLET datasets. The experimenta settings are discussed in Section 4.1. B.4.1 ABCDETC Dataset Figure 11: Typica histogram of projection of training sampes (n = 600) (shown in bue) and universum sampes upper case etters (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) digit 0. digit 1.(c) digit 2. (d) digit 3. (e) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.15, = 0) for (f) digit 0. (g) digit 1.(h) digit 2. (i) digit 3.(j) frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 12: Typica histogram of projection of training sampes (n = 600) (shown in bue) and universum sampes ower case etters (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) digit 0. digit 1.(c) digit 2. (d) digit 3. (e) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.15, = 0) for (f) digit 0. (g) digit 1.(h) digit 2. (i) digit 3.(j) frequency pot of predicted abes for universum sampes using MU-SVM mode. As seen from Figs 11-14,

26 Figure 13: Typica histogram of projection of training sampes (n = 600) (shown in bue) and universum sampes symbos (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) digit 0. digit 1.(c) digit 2. (d) digit 3. (e) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.15, = 0) for (f) digit 0. (g) digit 1.(h) digit 2. (i) digit 3.(j) frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 14: Typica histogram of projection of training sampes (n = 600) (shown in bue) and universum sampes random averaging (RA) (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) digit 0. digit 1.(c) digit 2. (d) digit 3. (e) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.15, = 0) for (f) digit 0. (g) digit 1.(h) digit 2. (i) digit 3.(j) frequency pot of predicted abes for universum sampes using MU-SVM mode. Upper : the SVM mode resuts in a narrow distribution of the universum sampes and in turn provides near random prediction on the universum sampes. Appying MU-SVM for this case provides no significant change to muticass SVM soution and hence no additiona improvement in generaization (see Tabe 2). Lower : the SVM mode resuts in a reativey wider distribution of the universum sampes (compared to Upper). Appying MU-SVM for this case provides some improvement to the muticass SVM (see Tabe 2). Symbo and RA : the SVM mode resuts in a wide distribution of the universum sampes. Further, in both the cases the universum sampes are mosty predicted as digit 1. Appying MU-SVM for this case resuts to a narrow distribution of the universum sampes and increases the uncertainity on the universum sampes. This resuts to a significant improvement to the muticass SVM soution (see Tabe 2).

27 B.4.2 ISOLET Dataset Figure 15: Typica histogram of projection of training sampes (n = 500) (shown in bue) and universum sampes Others (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) etter a. etter b.(c) etter c. (d) etter d. (e) etter e. (f) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.1, = 0.05) for (g) etter a. (h) etter b.(i) etter c. (j) etter d. (k) etter e. () frequency pot of predicted abes for universum sampes using MU-SVM mode. Figure 16: Typica histogram of projection of training sampes (n = 500) (shown in bue) and universum sampes RA (m = 1000) (shown in red). SVM decision functions (with C = 1, γ = 2 7 ) for (a) etter a. etter b.(c) etter c. (d) etter d. (e) etter e. (f) frequency pot of predicted abes for universum sampes using SVM mode. MU-SVM decision functions (with C /C = 0.1, = 0.1) for (g) etter a. (h) etter b.(i) etter c. (j) etter d. (k) etter e. () frequency pot of predicted abes for universum sampes using MU-SVM mode. As seen from Figs 15-16, Others : the SVM mode resuts in a near random prediction on the universum sampes. Appying MU-SVM for this case reduces the projection of the universum sampes but does not resut to a significant increase in the uncertaininty of the universum sampes, and hence no additiona improvement in generaization (see Tabe 2). RA : the SVM mode resuts in a wide distribution of the universum sampes. Further, the universum sampes are mosty predicted as etter d. Appying MU-SVM for this case resuts to a narrow distribution of the universum sampes and increases the uncertainity on the universum sampes. This resuts to a significant improvement to the muticass SVM soution (see Tabe 2).

28 B.5 Comparison of the error estimates using 5-Fod CV vs. Theorem 2 This section provides the error estimates curves for the different mode parameters for the datasets in Tabe 1. B.5.1 GTSRB dataset WITH VARYING C /C (a) Figure 17: Performance of MU-SVM with RA universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters C /C = [10 3, 10 2, 10 1, 10 0 ], C = 1, = 0. Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 18: Performance of MU-SVM with Non-Speed universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters C /C = [10 3, 10 2, 10 1, 10 0 ], C = 1, = 0. Ranking of the mode parameters with the smaest error estimate over each experiments. WITH VARYING (a) Figure 19: Performance of MU-SVM with RA universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, C /C = n ml = 0.1 Ranking of the mode parameters with the smaest error estimate over each experiments.

29 (a) Figure 20: Performance of MU-SVM with Non-Speed universum for the GTSRB dataset. Here, no. of training sampes (n = 300), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, C /C = n ml = 0.1 Ranking of the mode parameters with the smaest error estimate over each experiments. B.5.2 ABCDETC dataset WITH VARYING C /C (a) Figure 21: Performance of MU-SVM with Upper-case universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = 0, C = 1, γ = 2 7, C /C = [10 3, 10 2, 10 1, 10 0 ] Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 22: Performance of MU-SVM with Lower-case universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = 0, C = 1, γ = 2 7, C /C = [10 3, 10 2, 10 1, 10 0 ] Ranking of the mode parameters with the smaest error estimate over each experiments.

30 (a) Figure 23: Performance of MU-SVM with Symbo universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = 0, C = 1, γ = 2 7, C /C = [10 3, 10 2, 10 1, 10 0 ] Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 24: Performance of MU-SVM with RA universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = 0, C = 1, γ = 2 7, C /C = [10 3, 10 2, 10 1, 10 0 ] Ranking of the mode parameters with the smaest error estimate over each experiments. WITH VARYING (a) Figure 25: Performance of MU-SVM with Upper universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, γ = 2 7, C /C = n ml = 0.15 Ranking of the mode parameters with the smaest error estimate over each experiments.

31 (a) Figure 26: Performance of MU-SVM with Lower universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, γ = 2 7, C /C = n ml = 0.15 Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 27: Performance of MU-SVM with Symbo universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, γ = 2 7, C /C = n ml = 0.15 Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 28: Performance of MU-SVM with RA universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, γ = 2 7, C /C = n ml = 0.15 Ranking of the mode parameters with the smaest error estimate over each experiments. B.5.3 ISOLET dataset WITH VARYING C /C

32 (a) Figure 29: Performance of MU-SVM with Others universum for the ISOLET dataset. Here, no. of training sampes (n = 500), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters C /C = [10 3, 10 2, 10 1, 10 0 ], C = 1, = 0 Ranking of the mode parameters with the smaest error estimate over each experiments. (a) Figure 30: Performance of MU-SVM with RA universum for the ISOLET dataset. Here, no. of training sampes (n = 500), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters C /C = [10 3, 10 2, 10 1, 10 0 ], C = 1, = 0 Ranking of the mode parameters with the smaest error estimate over each experiments. WITH VARYING (a) Figure 31: Performance of MU-SVM with RA universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, C /C = n ml = 0.1 Ranking of the mode parameters with the smaest error estimate over each experiments.

33 (a) Figure 32: Performance of MU-SVM with Others universum for the ABCDETC dataset. Here, no. of training sampes (n = 600), no. of universum sampes (m = 1000) (a) Error estimates for the mode parameters = [0, 0.01, 0.05, 0.1], C = 1, C /C = n ml = 0.1 Ranking of the mode parameters with the smaest error estimate over each experiments.

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Adaptive Regularization for Transductive Support Vector Machine

Adaptive Regularization for Transductive Support Vector Machine Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1 Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rues 1 R.J. Marks II, S. Oh, P. Arabshahi Λ, T.P. Caude, J.J. Choi, B.G. Song Λ Λ Dept. of Eectrica Engineering Boeing Computer Services University

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Efficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression

Efficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression Machine Learning manuscript No. (wi be inserted by the editor) Efficient Approximate Leave-One-Out Cross-Vaidation for Kerne Logistic Regression Gavin C. Cawey, Nicoa L. C. Tabot Schoo of Computing Sciences

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

Kernel Trick Embedded Gaussian Mixture Model

Kernel Trick Embedded Gaussian Mixture Model Kerne Trick Embedded Gaussian Mixture Mode Jingdong Wang, Jianguo Lee, and Changshui Zhang State Key Laboratory of Inteigent Technoogy and Systems Department of Automation, Tsinghua University Beijing,

More information

MINIMAX PROBABILITY MACHINE (MPM) is a

MINIMAX PROBABILITY MACHINE (MPM) is a Efficient Minimax Custering Probabiity Machine by Generaized Probabiity Product Kerne Haiqin Yang, Kaizhu Huang, Irwin King and Michae R. Lyu Abstract Minimax Probabiity Machine (MPM), earning a decision

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Fast Blind Recognition of Channel Codes

Fast Blind Recognition of Channel Codes Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.

More information

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value ISSN 1746-7659, Engand, UK Journa of Information and Computing Science Vo. 1, No. 4, 017, pp.91-303 Converting Z-number to Fuzzy Number using Fuzzy Expected Vaue Mahdieh Akhbari * Department of Industria

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Can Active Learning Experience Be Transferred?

Can Active Learning Experience Be Transferred? Can Active Learning Experience Be Transferred? Hong-Min Chu Department of Computer Science and Information Engineering, Nationa Taiwan University E-mai: r04922031@csie.ntu.edu.tw Hsuan-Tien Lin Department

More information

STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION

STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION Journa of Sound and Vibration (996) 98(5), 643 65 STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM G. ERDOS AND T. SINGH Department of Mechanica and Aerospace Engineering, SUNY at Buffao,

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance On the Statistica Consistency of Agorithms for Binary Cassification under Cass Imbaance Aditya Krishna Menon University of Caifornia, San iego, La Joa CA 92093, USA Harikrishna Narasimhan Shivani Agarwa

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

MULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation

MULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation ationa Institute of Technoogy aicut Department of echanica Engineering ULTI-PERIOD ODEL FOR PART FAILY/AHIE ELL FORATIO Given a set of parts, processing requirements, and avaiabe resources The objective

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems Consistent inguistic fuzzy preference reation with muti-granuar uncertain inguistic information for soving decision making probems Siti mnah Binti Mohd Ridzuan, and Daud Mohamad Citation: IP Conference

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Chapter 2 Multi-Class Support Vector Machine

Chapter 2 Multi-Class Support Vector Machine hapter Muti-ass Support Vector Machine Zhe Wang and Xiangyang Xue Abstract Support vector machine (SVM) was initiay designed for binary cassification. To extend SVM to the muti-cass scenario, a number

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Sparse Semi-supervised Learning Using Conjugate Functions

Sparse Semi-supervised Learning Using Conjugate Functions Journa of Machine Learning Research (200) 2423-2455 Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

A Ridgelet Kernel Regression Model using Genetic Algorithm

A Ridgelet Kernel Regression Model using Genetic Algorithm A Ridgeet Kerne Regression Mode using Genetic Agorithm Shuyuan Yang, Min Wang, Licheng Jiao * Institute of Inteigence Information Processing, Department of Eectrica Engineering Xidian University Xi an,

More information

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING 8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Chapter 1 Decomposition methods for Support Vector Machines

Chapter 1 Decomposition methods for Support Vector Machines Chapter 1 Decomposition methods for Support Vector Machines Support Vector Machines (SVM) are widey used as a simpe and efficient too for inear and noninear cassification as we as for regression probems.

More information

Learning Gaussian Processes from Multiple Tasks

Learning Gaussian Processes from Multiple Tasks Kai Yu kai.yu@siemens.com Information and Communication, Corporate Technoogy, Siemens AG, Munich, Germany Voker Tresp voker.tresp@siemens.com Information and Communication, Corporate Technoogy, Siemens

More information