MINIMAX PROBABILITY MACHINE (MPM) is a

Size: px
Start display at page:

Download "MINIMAX PROBABILITY MACHINE (MPM) is a"

Transcription

1 Efficient Minimax Custering Probabiity Machine by Generaized Probabiity Product Kerne Haiqin Yang, Kaizhu Huang, Irwin King and Michae R. Lyu Abstract Minimax Probabiity Machine (MPM), earning a decision function by minimizing the maximum probabiity of miscassification, has demonstrated very promising performance in cassification and regression. However, MPM is often chaenged for its sow training and test procedures. Aiming to sove this probem, we propose an efficient mode named Minimax Custering Probabiity Machine (MCPM). Foowing many traditiona methods, we represent training data points by severa custers. Different from these methods, a Generaized Probabiity Product Kerne is appropriatey defined to grasp the inner distributiona information over the custers. Incorporating custering information via a non-inear erne, MCPM can fast train and test in cassification probem with promising performance. Another appeaing property of the proposed approach is that MCPM can sti derive an expicit worst-case accuracy bound for the decision boundary. Experimenta resuts on synthetic and rea data vaidate the effectiveness of MCPM for cassification whie attaining high accuracy. I. INTRODUCTION MINIMAX PROBABILITY MACHINE (MPM) is a recenty proposed earning mode and has demonstrated advantages in soving cassification probem in the iterature [10]. By minimizing the maximum probabiity of miscassification of future data points, MPM has shown competitive cassification accuracy against the state-of-theart cassifier, Support Vector Machine (SVM). One appeaing feature of MPM is that it can derive an expicit worst-case accuracy bound for the decision boundary. Foowing the idea of MPM, there have been many important extensions, e.g., the worst-case optima Bayesian cassification mode [7], its regression extension [19], the Biased Minimax Probabiity Machine for imbaanced cassification [5] and Medica Diagnosis [6]. However, MPM and its extensions are often chaenged for the time-consuming training and test procedures. The training of MPM is equivaent to soving a Second Order Cone Programming (SOCP) probem, whose worst-case compexity is O(n 3 ) (n is the number of the training sampes for the erneized MPM). The test compexity of the erneized MPM is aso reated to the number of training sampes. This maes the MPM-based modes inefficient for cassifying arge datasets. Haiqin Yang, Kaizhu Huang, Irwin King and Michae R. Lyu are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (emai: {hqyang,zhuang,ing,yu}@cse.cuh.edu.h). The wor described in this paper is supported by two grants, one from the CUHK Direct Grant # , and the other from the Research Grants Counci of the Hong Kong Specia Administrative Region, China (Project No. CUHK4150/07E). In soving arge-scae cassification probems, the stateof-the-art cassifier, Support Vector Machine, aso faces the same probem. Athough various improvements, e.g., Sequentia Minima Optimization [16], [9], Parae method [2], [4], have been made to speed up the training procedure in the SVMs, the training compexity of SVMs is sti high when the number of training sampes is arge. To sove this probem, custering-based SVMs, e.g., CB-SVMs [20], Support Custering Machine (SCM) [21], [12], have been proposed to seect representative quanta, e.g., typica points or custers, for the SVMs training, so as to reduce the training compexity. Motivated by the idea of custering-based SVMs, we propose the Minimax Custering Probabiity Machine (MCPM) to extend MPM for arge-scae cassification probems. The main idea of MCPM is as foows. The training sampes are custered in advance to output many generative modes. Then the obtained custers described by certain distributions are input as the training units, whie the test sampes are expained as specia custers centered on each specific data point. Instead of appying the probabiity product erne to measure the simiarity as used in [12], we define a nove generaized probabiity product erne, especiay, Radia Basis Functions on probabiity product erne to measure the simiarity either between any custers (in training) or between a custer and a test vector (in test). Finay, the decision function can be constructed in a erne form, which is ony reated to the training custers. Experiments on both synthetic and rea data show that the proposed MCPM reduce the computationa costs both in the training phase and the test phase, whie preserving the cassification accuracies. The proposed nove generaized probabiity product erne has a ot of advantages over the traditiona probabiity product erne as used in SCM [12]. First, as we show in the paper, the traditiona probabiity product erne is actuay a inear erne defined in the probabiity space, whie our generaized erne describes a non-inear erne which can generate more compex simiarity measures. Second, numerica probems such as the arge variance in erne matrix sometimes occur when the traditiona probabiity product erne is empoyed. These numerica probems often require carefu data adaptation, e.g., scaing up the erne matrix, maing the training sometimes not as straightforward as expected. In contrast, the proposed generaized probabiity product erne avoids such probems by projecting the probabiity into a non-inear space. The whoe earning process is easy to impement and requires no data adaptation. Third, the generaized probabiity product erne is more fexibe in /08/$25.00 c 2008 IEEE

2 measuring the simiarity. This is anaogous to the case that non-inear RBF ernes can usuay outperform the inear erne. Hence the proposed generaized erne defined over probabiities is often more accurate than the standard erne. Empirica evidence on rea data aso support this statement as ater seen in experiments. The contributions of the paper are summarized as foows. (1) The proposed MCPM argey reduces both the computationa and spatia costs for both training and test, whie eeping the cassification accuracy; (2) MCPM eeps the statistica information of training sampes by presenting them in generative modes; (3) by defining Radia Basis Functions on the probabiity product erne, the simiarity measurement coud deiver more information for cassification; (4) MCPM provides a worst-case accuracy bound for cassifying future data points; (5) MCPM can be impemented easiy by using the generaized erne. The rest of this paper is organized as foows. Section II derives the MCPM under a probabiity framewor simiar to that of the origina MPM. Section III defines the probabiity product erne and introduces a generaized probabiity product erne to measure the simiarity either between a pair of custers or between a custer and a test sampe. Section IV reports the experimenta setup and resuts on both synthetic dataset and rea datasets. Finay, the paper is concuded in Section V. II. CLASSIFICATION MODEL In this section, we first give a setch introduction to the MPM. We then formuate the MCPM in subsection II-B. A. Minimax Probabiity Machine for Binary Cassification Considering a binary cassification probem, suppose the data are generated from two casses of data, x and y. And data of cass x are drawn from a cass of distributions with mean and covariance matrices as { x, Σ x }, whie data of cass y are from another cass of distributions with mean and covariance matrices as {ȳ, Σ y },wherex, y, x, ȳ R d, and Σ x, Σ y R d d. Assuming { x, Σ x }, {ȳ, Σ y } for two casses of data are reiabe, MPM attempts to determine the hyperpane H(a,b)= {z a z = b} (a R d \{0}, andb R, and the superscript denotes the transpose) by separating two casses of data with the maxima probabiity. The formuation of the MPM mode is written as foows: max α,a 0,b α s.t. inf x { x,σ Pr{a x b} α, x} (1) inf y {ȳ,σ Pr{a y b} α, y} where α represents the worst-case accuracy of cassifying future data points. Future points z when a z b are then cassified as beonging to the cass associated with x, otherwise they are judged as beonging to the cass associated with y. This derived decision hyperpane is caimed to minimize the worst-case (maxima) probabiity of miscassification, or the worst-case error rate, of future data. Further, appying the generaization of Marsha and Oin s resut [14], [17], the optimization of MPM can be transformed to a Second Order Cone Programming (SOCP) probem as foows [13], [15]: min Σx 1/2 a 2 + Σy 1/2 a 2 s.t. a ( x ȳ) =1. (2) a The worst compexity of soving the optimization probem of MPM, i.e., soving the SOCP probem, in Eq. (2), is O(n 3 ),wherenis the number of training sampes for the erneized MPM. In the test phase, the compexity of the erneized MPM aso depends on the number of training sampes. This high computationa compexity is a main probem of appying MPM to rea appications. B. Minimax Custering Probabiity Machine Aiming at reducing the computation compexity of MPM, we propose the Minimax Custering Probabiity Machine (MCPM). The idea is as foows. The training sampes of cass x and training sampes of cass y are custered into M cx training custers and M cy training custers, respectivey. Foowing the Gaussian distribution assumption, we coud denote the training custers as generative modes, i.e., c j = (P j,μ j, Σ j ),wherep j, μ j, Σ j is the prior (weight), the mean, and the covariance matrix, of the j-th custer. For the positive custers, j ranges from 1 to M cx.forthenegative custers, j ranges from 1 to M cy. Hence, the tota number of training custers is M = M cx + M cy. In the foowing, we denote the space of generative modes as R G = R R d R d d. Therefore, the probem becomes to find a inear decision boundary H(c,b)={z R G c z = b}(c R G \{0},b R). We then transform the above generative modes, c j, j =1,...,M, from R G to a feature space, R f, via a mapping φ : R G R f. Therefore, a inear decision boundary H(c,b) = {φ(z) R f c φ(z) = b} in the feature space R f corresponds to a noninear decision decision boundary D(c,b)={z R G c φ(z) =b} in the space of R G (c R f \{0} and b R). Now, et the training custers be mapped as c x φ(c x ) {φ(c x ), Σ φ(cx)}, c y φ(c y ) {φ(c y ), Σ φ(cy)}. A noninear decision boundary in R G can then be obtained by soving the minimax probabiity decision probem of Eq. (1), in a feature space R f : max α,c 0,b α s.t. inf Pr{c φ(c x ) b} α, φ(c x) {φ(c x),σ φ(cx) } inf Pr{c φ(c y ) b} α. φ(c y) {φ(c y),σ φ(cy ) } Simiar to the optimization of Eq. (1), the above optimization can be soved by 1/2 1 τ := min Σ c φ(c x) c 2 + Σφ(c y) c 2 s.t. c (φ(c x ) φ(c y )) = 1. By adopting the erne tric simiar to that in [11], [7], we can write c as a inear combination of the training custers 1/ Internationa Joint Conference on Neura Networs (IJCNN 2008) 4015

3 and then find the coefficients. Without oss of generaity, c can be written as c = M cx ν iφ(c xi )+ Mcy j=1 ω jφ(c yj ). Let {t i } M denote the set of a M = M c x + M cy training custers as t j = c xj,j =1, 2,...,M cx,andt j = c yj Mcx,j = M cx +1,M cx +2,...,M. The Gram matrix G can be defined as G ij = G(φ(t i ),φ(t j )), i, j =1, 2,...,M. (3) Denote the first M cx rows and the ast M cy rows of G as G cx and G cy, respectivey, we get G =[G cx ; G cy ]. The boc-row-averaged Gram matrix K is then obtained by setting the row average of the G cx -boc and the G cy - boc equa to zero: ( ) ( ) Gcx 1 Mcx K = c x G cy 1 Mcy = Mcx K cx, c Mcy y K cy where 1 n is a coumn vector of ones of dimension n. The row average c x and c y are M-dimensiona row vectors given by ( cx )i = 1 M cx K(c xj, t i ) M cx j=1 ) ( M cy cy K(c yj, t i ) i M cy j=1 Hence, the objective of MCPM becomes 1 := min τ v K cx v 2 + K cy v 2 (4) s.t. v ( cx cy )=1, where v =[ν 1,ν 2,...,ν Mcx,ω 1,ω 2,...,ω Mcy ]. The decision function of MCPM is then cacuated by f MCPM (c z ) = + M cx vi K(c z, c xi ) (5) M cy vm cx +ik(c z, c yi ) b MCPM and the bias term is obtained by b MCPM = v cx τ K cx v 2 = v cy + τ K cy v 2. From the optimization of Eq. (4), we can see that the optimization is simiar to the erneized MPM. However, the number of training sampes is reduced argey to the number of training custers. III. GENERALIZED PROBABILITY PRODUCT KERNEL In soving Eq. (4), we sti need a suitabe distance definition to measure the simiarity between two custers in the training phase or the simiarity between a custer and a sampe vector in the test phase. In the foowing, we wi first introduce genera ernes in the feature space. We then derive the generaized probabiity product erne. After that, we present how to appy the generaized probabiity product erne, more speciay, the inear probabiity product erne and the radia basis function on probabiity product erne, to rea appications. A. Kerne in Feature Space The erne is defined in Eq. (3). Considering a inear erne in the feature space, we can define it as G L (φ(t i ),φ(t j )) = φ(t i ) φ(t j ). (6) Simiary, we can define a RBF erne in the feature space: G RBF (φ(t i ),φ(t j )) = exp{ γ φ(t i ) φ(t j ) 2 } (7) We can aso extend the erne in the feature space to genera forms by other functions, e.g., the poynomia erne and the hyperboic tangent erne [18]. This can attain generaized ernes in the feature space. B. Probabiity Product Kerne Here we sti need to define the inner product of two vectors in the feature space. Considering the property of the generative modes we have obtained, we turn to the probabiity product erne [8]. The probabiity product erne defines the simiarity between two distribution p and p by K(p,p )= dz, (8) R d p ρ pρ where K(p,p ) is positive definite, and the exponentia ρ wi derive a set of candidate ernes. When ρ =1, it eads to the expected ieihood erne [8]. When p and p are both Gaussian distributions, i.e., p = P p(z μ, Σ ) and p = P p(z μ, Σ ), K(p,p ) can be written as a function of two generative modes, i.e., K(c, c ).Further,K(c, c ) can be computed directy by using the corresponding parameters of two generative modes to avoid integrating the probabiity distributions in the entire input space. Hence, when ρ =1,wehave K(c, c )=φ(t i ) φ(t j ) (9) = P P (2π) d 2 (Σ 1 +Σ 1 ) Σ 1 2 Σ 1 2 { exp 1 ( ) } μ 2 Σ 1 μ + μ Σ 1 μ μ Σ 1 μ where Σ 1 =(Σ 1 +Σ 1 ) 1,and μ =Σ 1 μ +Σ 1 μ. C. Practica Soution For rea cassification probems, in order to avoid computing the inverse matrices in Eq. (9), we simpify the erne cacuation by ony using the diagona entries of the covariance matrices, i.e., Σ = diag((σ,1 2,...,(σ2,d ). Thus, the erne becomes = K(c, c )=φ(c ) φ(c ) (10) { } P P exp d 2π(σ 1 d (μ,i μ,i ) 2,i 2 + σ2,i ) 2 σ,i 2 +. σ2,i In the test phase, a test sampe z is considered as the extreme case of a Gaussian distribution, where ony one point in the distribution with fixed prior and a eements of the covariance matrix vanishing, i.e., c z =(P z =1,μ z = Internationa Joint Conference on Neura Networs (IJCNN 2008)

4 z, Σ z = 0). Hence, the simiarity between a training custer and a test vector is defined by K(c z, c )=φ(c z ) φ(c ) (11) { } d 1 = P exp 1 d (z μ,i ) 2. 2πσ,i 2 σ 2,i Hence, we term the cacuation of Eq. (10) and Eq. (11) as inear probabiity product ernes (LPPK) for cacuating simiarity between custers in training phase and that between a custer and a test vector in test phase. After pugging Eq. (10) and Eq. (11) into the RBF form of Eq. (7), we therefore define the Radia Basis Function on probabiity product erne (RBF-PPK). Simiary, by defining other measurements in the feature space, e.g., Poynomia functions, or Hyperboic tangent, we can extend and obtain a generaized probabiity product erne. Comparing the inear erne in Eq. (6) and the RBF erne in Eq. (7), we can see that the RBF erne has the advantage of containing unit vaue when two inputs are the same, whie the inear erne has no such property. For rea appications, data are usuay in high dimensiona space. This maes the cacuation vaues of Eq. (10) and Eq. (7) very sma and appear in different scaes. It woud incur the difficuty of tuning parameters to get good resuts. However, the RBF- PPK has the abiity of normaizing the erne matrix and hence avoids the probems of the inear PPK. IV. EXPERIMENTS We carry out experiments on a two-cass toy dataset and two benchmar datasets to demonstrate the effectiveness of MCPM. In the toy dataset, 2, 000 data points, 1, 000 points for each cass, were randomy generated from a mixture of Gaussian distribution in order to visuaize the earning resuts of the MCPM in the 2-D space. The rea datasets used are the two benchmar binary cassification datasets, the Pima indians diabetes dataset and the Twonorm dataset, from the machine earning repository [1], [11], [7]. The Pima indians diabetes dataset consists of 768 instances with 8 attributes. The twonorm dataset, consisting of 7, 400 sampes with 20 attributes, was generated from a mutivariate norma distribution [1], [11]. A. Toy Data B. Experimenta Setup and Mode Seection In the experiment, each dataset is partitioned into 90% training and 10% test sets. The fina resuts are the average resuts over 10 random partitions. Comparisons are performed on MCPM, Support Custering Machine (SCM) [12] (both in LPPK and RBF-PPK), and the MPM. For fair comparisons, we adopt the Threshod Order Dependent (TOD) agorithm [3], the same custering method as used in [12]. The experiments are performed on a PC with a 2.13GHz Inte Core 2 CPU and 1G RAM. We use Matab 7.1 to conduct the comparisons. Severa parameters need to be tuned in training different modes. For the erneized MPM, we use the Gaussian erne, e x y 2 /σ, with parameter σ. For SCM, the parameters are the trade-off parameter C and the width parameter γ when the RBF-PPK is used. For the MCPM with RBF-PPK, ony the width parameter γ needs to be tuned. A these parameters are chosen via cross-vaidation on the training dataset. 2, 000 data points, generated from a mixture of Gaussian distribution, are potted in Fig. 1(a). The TOD agorithm is appied to group the x-cass data into 15 positive custers and to group the y-cass sampes into 15 negative custers, respectivey. As shown in Fig. 1(b), the training custers are denoted by eipses, where the weights are proportiona to the sizes of custers. In the experiment, the obtained weights (priors), means, and covariance matrices of training custers are used as the input for SCM and MCPM. Tabe I reports the average training time, test time and accuracies obtained by the erneized MPM, SCM and MCPM with LPPK and RBF-PPK. We can see that the time cost of MCPM is argey reduced when compared to that of MPM, whie the accuracy is just sighty decreased. Especiay, the training time of the erneized MPM is reduced from to , which is over 4, 500 times reduction. Meanwhie, there is over 30 times reduction in the test time. The training and test time for SCM and MCPM are neary the same for LPPK and RBF-PPK, whie the SCM and MCPM with RBF-PPK outperforms the SCM and MCPM with LPPK respectivey in terms of the accuracy. This shows that the generaized non-inear probabiity product erne, i.e., RBF- PPK, is superior to the inear PPK in the toy data. Moreover, different from SCM, both MPM and MCMP can generate an expicit worst-case accuracy bound α. Furthermore, the bound of MCPM with RBF-PPK is tighter than those of MPM and MCPM with LPPK. This again demonstrates the superiority of the MCPM using the generaized PPK over other methods. C. Benchmar Datasets We compare the performance of the proposed MCPM with other methods on two benchmar datasets in this section. Tabe II reports the average training time, test time, worstcase accuracy bounds, and accuracies on the Pima indians diabetes dataset; whie Tabe III reports the average resuts on the Twonorm dataset. From tabe II and tabe III, we have the foowing observations: Athough the erneized MPM has better accuracy than the inear MPM, it costs too much time on the training procedure. The MCPM overcomes the shortcoming of the erneized MPM: it reduces the training time argey, over 10, 000 times reduction, whie maintaining a comparabe accuracy to that of the erneized MPM for both datasets. The MCPM can aso output a worst-case accuracy bound α. The bound is once again tighter than that of the MPM Internationa Joint Conference on Neura Networs (IJCNN 2008) 4017

5 1 Toydata cassification resuts x SCM (LPPK) SCM (RBF PPK) MPM 0.4 MCPM 1 (LPPK) MCPM (RBF PPK) x 1 (a) Sampes (b) Resuts of SCM, MPM and MCPM Fig. 1. Kerneized MPM, SCM and MCPM with LPPK and RBF-PPK in a 2-D space. Training data are indicated with bue + s for cass x and red s for cass y. Test sampes are indicated with bac s for cass x and green o s for cass y. The training custers are represented by eipses with size proportiona to the priors, bue eipses for cass c x and red eipses for cass c y. The decision boundaries constructed by the SCM with LPPK (thin green dotted ine), the SCM with RBF-PPK (thic red dash-dot ine), the MPM (thin bue soid ine), the MCPM with LPPK (thic magenta soid ine), and the MCPM with RBF-PPK (thic bac dash ine) are shown. Notice that SCM and MCPM with RBF-PPK improve the test set performance compared to the SCM and MCPM with LPPK. TABLE I AVERAGE RESULTS ON THE SYNTHETIC DATASET. Methods Training (s) Test (s) α (%) Accuracy (%) SCM (LPPK) ± ± ± 0.8 SCM (RBF-PPK) ± ± ± 0.6 MPM ± ± ± ± 1.0 MCPM (LPPK) ± ± ± ± 0.9 MCPM (RBF-PPK) ± ± ± ± 0.8 TABLE II AVERAGE RESULTS ON THE PIMA INDIANS DIABETES DATASET. Methods Training (s) Test (s) α (%) Accuracy (%) SCM (LPPK) ± ± ± 1.6 SCM (RBF-PPK) ± ± ± 1.6 MPM (Linear) ± ± ± ± 0.9 MPM (Kerne) ± ± ± ± 0.8 MCPM LPPK ± ± ± ± 1.4 MCPM RBF-PPK ± ± ± ± 1.5 The proposed generaized probabiity product erne, i.e., the RBF-PPK can argey improve the accuracy of the traditiona probabiity product erne, i.e., the LPPK, inbothscmandmcpm. The above observations vaidate the advantages of our proposed method and show that both the training and the test time can be reduced greaty by our MCPM method whie the accuracy can be maintained. Moreover, the proposed generaized probabiity erne can deiver better accuracies than the traditiona probabiity product erne. In order to examine the performance when different custer numbers are chosen, we show the average test error rates of SCM and MCPM using LPPK and RBF-PPK with respect to the number of training custers in Fig. 2. From this figure, we have the foowing observations. First, the best resuts of SCM and MCPM with LPPK and RBF-PPK are obtained in different number of training custers. This shows that the number of custers coud indeed infuence the overa accuracy. In order to obtain the best performance, we may need to choose this parameter carefuy. Second, in a the cases, the earning agorithms using the RBF- PPK consistenty outperform those using the LPPK. This ceary demonstrates the advantages of the proposed noninear generaized PPK. V. CONCLUSION In this paper, we have proposed an efficient Minimax Custering Probabiity Machine mode. This mode can eeganty incorporate custer information into the earning process so as to reduce the training and test time compexity greaty. We have proposed a generaized probabiity product erne. This erne has demonstrated desirabe properties in measuring the simiarity defined either between a pair of custers or a custer and a test vector. Experimenta resuts on both synthetic and rea data show that the proposed agorithm can reduce the training and test time significanty whie preserving the accuracy. Moreover, the proposed generaized probabiity product erne has been shown to outperform the traditiona inear probabiity product erne consistenty Internationa Joint Conference on Neura Networs (IJCNN 2008)

6 TABLE III AVERAGE RESULTS ON THE TWONORM DATASET. Methods Training (s) Test (s) α (%) Accuracy (%) SCM (LPPK) ± ± ± 1.7 SCM (RBF-PPK) ± ± ± 1.5 MPM (Linear) ± ± ± ± 0.5 MPM (Kerne) ± ± ± ± 0.5 MCPM (LPPK) ± ± ± ± 1.5 MCPM (RBF-PPK) ± ± ± ± 1.2 Average Error Rates of SCM and MCPM on Pima Dataset 0.16 Average Error Rates of SCM and MCPM on Twonorm Dataset Error rates 0.3 Error rates SCM (LPPK) SCM (RBF PPK) MCPM (LPPK) MCPM (RBF PPK) 0.02 SCM (LPPK) SCM (RBF PPK) MCPM (LPPK) MCPM (RBF PPK) No. of custers No. of custers (a) Pima Dataset (b) Twonorm Dataset Fig. 2. Average error rates of SCM and MCPM with respect to the number of custers on Pima dataset and Twonorm dataset. Severa important issues deserve our attentions in the future. First, the custering and the cassifier earning are currenty impemented in two separate steps. It remains interesting whether these two steps can be unified in one step. Second, athough both theoretica justification and empirica verification has demonstrated the advantages of the proposed generaized probabiity product erne, further exporations on its mathematic properties are sti important research topics. Third, we mainy evauate our agorithm on two-cass data in this paper for simpicity. Extensive investigations on arge-scae muti-cass rea data are aso necessary. Finay, how to choose the optima custer number is aso an important research topic in the future. REFERENCES [1] L. Breiman. Arcing cassifiers. Technica Report 460, Statistics Department, University of Caifornia, [2] R. Coobert and S. Bengio. SVMTorch: Support vector machines for arge-scae regression probems. Journa of Machine Learning Research, 1: , [3] M. Friedman and A. Kande. Introduction to pattern recognition : statistica, structura, neura, and fuzzy ogic approaches. Word scientific, Singapore, [4] Hans Peter Graf, Eric Cosatto, Léon Bottou, Igor Dourdanovic, and Vadimir Vapni. Parae support vector machines: The cascade svm. In NIPS 17, pages , [5] K. Huang, H. Yang, I. King, and M. R. Lyu. Learning cassifiers from imbaanced data based on biased minimax probabiity machine. In CVPR-2004, voume 2, pages , [6] K. Huang, H. Yang, I. King, and M. R. Lyu. Maximizing sensitivity in medica diagnosis using biased minimax probabiity machine. IEEE Transactions on Biomedica Engineering, 53: , [7] K. Huang, H. Yang, I. King, M. R. Lyu, and L. Chan. The minimum error minimax probabiity machine. Journa of Machine Learning Research, 5: , [8] T. Jebara, R. Kondor, and A. Howard. Probabiity product ernes. Journa of Machine Learning Research, Specia Topic on Learning Theory, pages , [9] S. Sathiya Keerthi, Shirish Krishnaj Shevade, Chiranjib Bhattacharyya, and K. R. K. Murthy. Improvements to patt s smo agorithm for svm cassifier design. Neura Computation, 13(3): , [10] G. R. G. Lancriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. Minimax probabiity machine. In NIPS 15, [11] G. R. G. Lancriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. A robust minimax approach to cassification. Journa of Machine Learning Research, 3: , [12] B. Li, M. Chi, J. Fan, and X. Xue. Support custer machine. In ICML 07, pages , [13] M. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Appications of second order cone programming. Linear Agebra and its Appications, 284: , [14] A. W. Marsha and I. Oin. Mutivariate chebyshev inequaities. Annas of Mathematica Statistics, 31(4): , [15] Y. Nesterov and A. Nemirovsy. Interior point poynomia methods in convex programming: Theory and appications. Studies in Appied Mathematics. Phiadephia, [16] John C. Patt. Fast training of support vector machines using sequentia minima optimization. In B. Schöopf, C. J. C. Burges, and A. J. Smoa, editors, Advances in Kerne Methods Support Vector Learning. MIT Press, [17] I. Popescu and D. Bertsimas. Optima inequaities in probabiity theory: A convex optimization approach. Technica Report TM62, INSEAD, [18] B. Schoopf and A. Smoa. Learning with Kernes. MIT Press, Cambridge, MA, [19] T. Strohmann and G. Grudic. A formuation for minimax probabiity machine regression. In S. Becer, S. Thrun, and K. Obermayer, editors, In NIPS 15. MIT Press, [20] H. Yu, J. Yang, and J. Han. Cassifying arge data sets using svms with hierarchica custers. In Proceedings of the ACM SIGKDD Internationa Conference on Knowedge Discovery and Data Mining, pages , [21] J. Yuan, J. Li, and B. Zhang. Learning concepts from arge scae imbaanced data sets using support custer machines. In ACM Mutimedia, pages , Internationa Joint Conference on Neura Networs (IJCNN 2008) 4019

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Adaptive Regularization for Transductive Support Vector Machine

Adaptive Regularization for Transductive Support Vector Machine Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Evolutionary Product-Unit Neural Networks for Classification 1

Evolutionary Product-Unit Neural Networks for Classification 1 Evoutionary Product-Unit Neura Networs for Cassification F.. Martínez-Estudio, C. Hervás-Martínez, P. A. Gutiérrez Peña A. C. Martínez-Estudio and S. Ventura-Soto Department of Management and Quantitative

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law Gauss Law 1. Review on 1) Couomb s Law (charge and force) 2) Eectric Fied (fied and force) 2. Gauss s Law: connects charge and fied 3. Appications of Gauss s Law Couomb s Law and Eectric Fied Couomb s

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Research Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation

Research Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation Appied Mathematics and Stochastic Anaysis Voume 007, Artice ID 74191, 8 pages doi:10.1155/007/74191 Research Artice On the Lower Bound for the Number of Rea Roots of a Random Agebraic Equation Takashi

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

CURRENT patent classification mainly relies on human

CURRENT patent classification mainly relies on human Large-Scae Patent Cassification with in-ax oduar Support Vector achines Xiao-Lei Chu, Chao a, Jing Li, Bao-Liang Lu Senior ember, IEEE, asao Utiyama, and Hitoshi Isahara Abstract Patent cassification is

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Research on liquid sloshing performance in vane type tank under microgravity

Research on liquid sloshing performance in vane type tank under microgravity IOP Conference Series: Materias Science and Engineering PAPER OPEN ACCESS Research on iquid soshing performance in vane type tan under microgravity Reated content - Numerica simuation of fuid fow in the

More information

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

Chapter 2 Multi-Class Support Vector Machine

Chapter 2 Multi-Class Support Vector Machine hapter Muti-ass Support Vector Machine Zhe Wang and Xiangyang Xue Abstract Support vector machine (SVM) was initiay designed for binary cassification. To extend SVM to the muti-cass scenario, a number

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Kernel Trick Embedded Gaussian Mixture Model

Kernel Trick Embedded Gaussian Mixture Model Kerne Trick Embedded Gaussian Mixture Mode Jingdong Wang, Jianguo Lee, and Changshui Zhang State Key Laboratory of Inteigent Technoogy and Systems Department of Automation, Tsinghua University Beijing,

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

A Sparse Covariance Function for Exact Gaussian Process Inference in Large Datasets

A Sparse Covariance Function for Exact Gaussian Process Inference in Large Datasets A Covariance Function for Exact Gaussian Process Inference in Large Datasets Arman ekumyan Austraian Centre for Fied Robotics The University of Sydney NSW 26, Austraia a.mekumyan@acfr.usyd.edu.au Fabio

More information

V.B The Cluster Expansion

V.B The Cluster Expansion V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f(q ) = exp ( βv( q )) 1, which is obtained by summing over

More information

Fast Spectral Clustering via the Nyström Method

Fast Spectral Clustering via the Nyström Method Fast Spectra Custering via the Nyström Method Anna Choromanska, Tony Jebara, Hyungtae Kim, Mahesh Mohan 3, and Caire Monteeoni 3 Department of Eectrica Engineering, Coumbia University, NY, USA Department

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

Notes on Backpropagation with Cross Entropy

Notes on Backpropagation with Cross Entropy Notes on Backpropagation with Cross Entropy I-Ta ee, Dan Gowasser, Bruno Ribeiro Purue University October 3, 07. Overview This note introuces backpropagation for a common neura network muti-cass cassifier.

More information

Width of Percolation Transition in Complex Networks

Width of Percolation Transition in Complex Networks APS/23-QED Width of Percoation Transition in Compex Networs Tomer Kaisy, and Reuven Cohen 2 Minerva Center and Department of Physics, Bar-Ian University, 52900 Ramat-Gan, Israe 2 Department of Computer

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Worst Case Analysis of the Analog Circuits

Worst Case Analysis of the Analog Circuits Proceedings of the 11th WSEAS Internationa Conference on CIRCUITS, Agios Nikoaos, Crete Isand, Greece, Juy 3-5, 7 9 Worst Case Anaysis of the Anaog Circuits ELENA NICULESCU*, DORINA-MIOARA PURCARU* and

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS ECCM6-6 TH EUROPEAN CONFERENCE ON COMPOSITE MATERIALS, Sevie, Spain, -6 June 04 THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS M. Wysocki a,b*, M. Szpieg a, P. Heström a and F. Ohsson c a Swerea SICOMP

More information

Improving the Reliability of a Series-Parallel System Using Modified Weibull Distribution

Improving the Reliability of a Series-Parallel System Using Modified Weibull Distribution Internationa Mathematica Forum, Vo. 12, 217, no. 6, 257-269 HIKARI Ltd, www.m-hikari.com https://doi.org/1.12988/imf.217.611155 Improving the Reiabiity of a Series-Parae System Using Modified Weibu Distribution

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Interactive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms

Interactive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms Md. Abu Kaam Azad et a./asia Paciic Management Review (5) (), 7-77 Interactive Fuzzy Programming or Two-eve Noninear Integer Programming Probems through Genetic Agorithms Abstract Md. Abu Kaam Azad a,*,

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION Statistica Sinica 16(2006), 425-439 CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION Yi Lin and Ming Yuan University of Wisconsin-Madison and Georgia Institute of Technoogy

More information

V.B The Cluster Expansion

V.B The Cluster Expansion V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f( q ) = exp ( βv( q )), which is obtained by summing over

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance On the Statistica Consistency of Agorithms for Binary Cassification under Cass Imbaance Aditya Krishna Menon University of Caifornia, San iego, La Joa CA 92093, USA Harikrishna Narasimhan Shivani Agarwa

More information

On generalized quantum Turing machine and its language classes

On generalized quantum Turing machine and its language classes Proceedings of the 11th WSEAS Internationa onference on APPLIED MATHEMATIS, Daas, Texas, USA, March -4, 007 51 On generaized quantum Turing machine and its anguage casses SATOSHI IRIYAMA Toyo University

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Supervised i-vector Modeling - Theory and Applications

Supervised i-vector Modeling - Theory and Applications Supervised i-vector Modeing - Theory and Appications Shreyas Ramoji, Sriram Ganapathy Learning and Extraction of Acoustic Patterns LEAP) Lab, Eectrica Engineering, Indian Institute of Science, Bengauru,

More information