Kernel Trick Embedded Gaussian Mixture Model

Size: px
Start display at page:

Download "Kernel Trick Embedded Gaussian Mixture Model"

Transcription

1 Kerne Trick Embedded Gaussian Mixture Mode Jingdong Wang, Jianguo Lee, and Changshui Zhang State Key Laboratory of Inteigent Technoogy and Systems Department of Automation, Tsinghua University Beijing, , P. R. China Abstract. In this paper, we present a kerne trick embedded Gaussian Mixture Mode (GMM), caed kerne GMM. The basic idea is to embed kerne trick into EM agorithm and deduce a parameter estimation agorithm for GMM in feature space. Kerne GMM coud be viewed as a Bayesian Kerne Method. Compared with most cassica kerne methods, the proposed method can sove probems in probabiistic framework. Moreover, it can tacke noninear probems better than the traditiona GMM. To avoid great computationa cost probem existing in most kerne methods upon arge scae data set, we aso empoy a Monte Caro samping technique to speed up kerne GMM so that it is more practica and efficient. Experimenta resuts on synthetic and rea-word data set demonstrate that the proposed approach has satisfing performance. 1 Introduction Kerne trick is an efficient method for noninear data anaysis eary used by Support Vector Machine (SVM) [18]. It has been pointed out that kerne trick coud be used to deveop noninear generaization of any agorithm that coud be cast in the term of dot products. In recent years, kerne trick has been successfuy introduced into various machine earning agorithms, such as Kerne Principa Component Anaysis (Kerne PCA) [14,15], Kerne Fisher Discriminant (KFD) [11], Kerne Independent Component Anaysis (Kerne ICA) [7] and so on. However, in many cases, we are required to obtain risk minimization resut and incorporate prior knowedge, which coud be easiy provided within Bayesian probabiistic framework. This makes the emerging of combining kerne trick and Bayesian method, which is caed Bayesian Kerne Method [16]. As Bayesian Kerne Method is in probabiistic framework, it can reaize Bayesian optima decision and estimate confidence or reiabiity easiy with probabiistic criteria such as Maximum-A-Posterior [5] and so on. Recenty some researches have been done in this fied. Kwok combined the evidence framework with SVM [10], Geste et a. [8] incorporated Bayesian framework with SVM and KFD. These two work are both to appy Bayesian framework to known kerne method. On the other hand, some researchers proposed R. Gavadà et a. (Eds.): ALT 2003, LNAI 2842, pp , c Springer-Verag Berin Heideberg 2003

2 160 J. Wang, J. Lee, and C. Zhang new Bayesian methods with kerne trick embedded, among which one of the most infuentia work is the Reevance Vector Machine (RVM) proposed by Tipping [17]. This paper aso addresses the probem of Bayesian Kerne Method. The proposed method is that we embed kerne trick into Expectation-Maximization (EM) agorithm [3], and deduce a new parameter estimation agorithm for Gaussian Mixture Mode (GMM) in the feature space. The entire mode is caed kerne Gaussian Mixture Mode (kgmm). The rest of this paper is organized as foows. Section 2 reviews some background knowedge, and Section 3 describes the kerne Gaussian Mixture Mode and the corresponding parameter estimation agorithm. Experiments and resuts are presented in Section 4. Concusions are drawn in the fina section. 2 Preiminaries In this section, we review some background knowedge incuding the kerne trick, GMM based on EM agorithm and Bayesian Kerne Method. 2.1 Kerne Trick Mercer kerne trick was eary appied by SVM. The idea is that we can impicity map input data into a high dimension feature space via a noninear function: Φ : X H x φ(x) (1) And a simiarity measure is defined from the dot product in space H as foows: k (x, x ) ( φ(x) φ(x ) ) (2) where the kerne function k shoud satisfy Mercer s condition [18]. Then it aows us to dea with earning agorithms using inear agebra and anaytic geometry. Generay speaking, on the one hand kerne trick coud dea with data in the high-dimensiona dot product space H, which is named feature space by a map associated with k. On the other hand, it avoids expensive computation cost in feature space by empoying the kerne function k instead of directy computing dot product in H. Being an eegant way for noninear anaysis, kerne trick has been used in many other agorithms such as Kerne Fisher Discriminant [11], Kerne PCA [14, 15], Kerne ICA [7] and so on. 2.2 GMM Based on EM Agorithm GMM is a kind of mixture density modes, which assumes that each component of the probabiistic mode is a Gaussian density. That is to say:

3 Kerne Trick Embedded Gaussian Mixture Mode 161 p(x Θ) = M α ig i (x θ i ) (3) i=1 where x R d is a random variabe, parameters Θ = ( ) α 1,,α M ; θ 1, θ M satisfy M i=1 α i = 1, α i 0 and G i (x θ i ) is a Gaussian probabiity density function: 1 { G (x θ )= exp 1 } (2π) d/2 1/2 Σ 2 (x µ ) T Σ 1 (x µ ) (4) where θ =(µ,σ ). GMM coud be viewed as a generative mode [12] or a atent variabe mode [6] that assumes data set X = {x i } N i=1 are generated by M Gaussian components, and introduces the atent variabes item Z = {z i } N i=1 whose vaue indicates which component generates the data. That is to say, we assume that if sampe x i is generated by the th component and then z i =. Then the parameters of GMM coud be estimated by the EM agorithm [2]. EM agorithm for GMM is an iterative procedure, which estimates the new parameters in terms of the od parameters as the foowing updating formuas: α (t) = 1 N i=1 p( x i,θ (t 1) ) µ (t) i=1 = x ip( x i,θ (t 1) ) i=1 p( x i,θ (t 1) ) Σ (t) = i=1 (x i µ (t) i=1 p( x i,θ (t 1) ) )(x i µ (t) ) T p( x i,θ (t 1) ) (5) where represents the th Gaussian component, and p( x i,θ (t 1) )= α (t 1) M j=1 α(t 1) j G(x i θ (t 1) ) G(x i θ (t 1) j ),=1,,M (6) Θ (t 1) = ( α (t 1) 1,,α (t 1) M ; θ(t 1) 1,,θ (t 1) ) M are parameters of the (t 1) th iteration and Θ (t) = ( α (t) 1,,α(t) M ; ) θ(t) 1,,θ(t) M are parameters of the (t) th iteration. GMM has been successfuy appied in many fieds, such as parametric custering, density estimation and so on. However, for instance, it can t give a simpe but satisfied custering resut on data set with compex structure [13] as shown in Figure 1. One aternative way is to perform GMM based custering in another space instead of in origina data space.

4 162 J. Wang, J. Lee, and C. Zhang (a) (b) (c) Fig. 1. Data set of two concentric circes with 1,000 sampes, points marked by beong to one custer and marked by beong to the other. (a) is the partition resut by traditiona GMM, (b) is the resut achieved by kgmm using poynomia kerne of degree 2; (c) shows the probabiity that each point beongs to the outer circe. The whiter the point is, the higher the probabiity is. 2.3 Bayesian Kerne Method Bayesian Kerne Method coud be viewed as a combination of Bayesian method and Kerne method. It inherits merits from both these two methods. It coud tacke probems nonineary ike kerne method, and it obtains estimation resuts within a probabiistic framework ike cassica Bayesian methods. Many works have been done in this fied. There are typicay three different ways. Interpretation of kerne methods in Bayesian framework such as SVM and other kerne methods in Bayesian framework as in [10,8]; Empoying kerne methods in traditiona Bayesian methods such as Gaussian Processes and Lapacian Processes [16]; Proposing new methods in Bayesian framework with kerne trick embedded such as Reevance Vector Machine (RVM) [17] and Bayes Point Machine [9]. And we intend to embed kerne trick into Gaussian Mixture Mode. This work just beongs to the second category of Bayesian Kerne Method. 3 Kerne Trick Embedded GMM As mentioned before, Gaussian Mixture Mode can not obtain simpe but satisfied resuts on data sets with compex structure, so we consider empoying kerne trick to reaize a Bayesian Kerne version of GMM. Our basic idea is to embed kerne trick into parameter estimation procedure of GMM. In this section, we firsty describe GMM in feature space, secondy present the properties in feature space, then formuate the Kerne Gaussian Mixture Mode and the corresponding parameter estimation agorithm, finay make some discussions on the agorithm.

5 Kerne Trick Embedded Gaussian Mixture Mode GMM in Feature Space GMM in feature space by a map φ( ) associated with kerne function k can be easiy rewritten as p(φ(x) Θ) = M α ig(φ(x) θ i ) (7) i=1 and the EM updating formua in (5) and (6) can be repaced by the foowing. α (t) = 1 N i=1 p( φ(x i),θ (t 1) ) µ (t) i=1 = φ(x i)p( φ(x i ),Θ (t 1) ) i=1 p( φ(x i),θ (t 1) ) Σ (t) = i=1 (φ(x i) µ (t) i=1 p( φ(x i),θ (t 1) ) )(φ(x i ) µ (t) ) T p( φ(x i ),Θ (t 1) ) (8) where represents the th Gaussian component, and p( φ(x i ),Θ (t 1) )= α (t 1) M j=1 α(t 1) j G(φ(x i ) θ (t 1) ) G(φ(x i ) θ (t 1) j ), =1,,M (9) However, computing GMM directy with formua (8) and (9) in a high dimension feature space is computationay expensive thus impractica. We consider empoying kerne trick to overcome this difficuty. In the foowing section, we wi give some properties based on Mercer kerne trick to estimate the GMM parameters in feature space. 3.2 Properties in Feature Space To be convenient, notations in feature space are given firsty, and then three properties are presented. Notations. In a the formuas, bod and capita etters are for matrixes, itaic and bod etters are for vectors, and itaic and ower case are for scaars. Subscript represents the th Gaussian component, superscript t represents the t th iteration of the EM procedure. A T represents transpose of matrix A. Other notations are shown in Tabe 1.

6 164 J. Wang, J. Lee, and C. Zhang p (t) Tabe 1. Notation List i = p( φ(x i),θ (t) ) Posterior that φ(x i) beongs to the th component. (p w (t) (t)/ i = M ) i j=1 p(t) ji (w (t) i )2 represents ratio that φ(x i) is occupied by the th Gaussian component. µ (t) φ (t) Σ (t) = i=1 φ(xi)(w(t) i )2 (x i)=φ(x i) µ (t) = i=1 φ(x i) φ(x i) T (w (t) i )2 Mean vector of the th Gaussian component. Centered image of φ(x i). Covariance matrix of the th Gaussian component. (K (t) ) ij = ( (w i φ(x i) w j φ(x ) j) Kerne matrix ( K (t) ) ij = ( (w i φ(xi) w j φ(xj) ) Centered kerne matrix. (K (t) ) ij = ( (φ(x i) w j φ(x ) j) Projecting kerne matrix. ( K (t) ) ij = ( φ(xi) w j φ(xj) ) Centered projecting kerne matrix. λ (t) e, V (t) e λ (t) e, β(t) e Eigenvaue and eigenvector of Σ (t). Eigenvaue and eigenvector of K (t). Properties in Feature Space. According to Lemma 1 in Appendix, we can get the first property. [Property 1] Centered kerne matrix K (t) and centered projecting kerne matrix ( K (t) ) are computed from the foowing formuas: ( K (t) K (t) ) =(K (t) = K (t) W (t) K (t) ) (W (t) ) K (t) K (t) W (t) (K (t) ) W (t) + W (t) +(W (t) K (t) W (t) ) K (t) W (t) (10) where W (t) = w (t) (w (t) ) T,(W (t) ) =1 N (w (t) ) T, w (t) =[w (t) 1,,w(t) N ]T, and 1 N is a N-dimensiona coumn vector with a entries equa to 1. This property presents the way to center the kerne matrix and the projecting kerne matrix. According to Lemma 2 in Appendix and Property 1, we can obtain the second property. [Property 2] Feature space covariance matrix Σ (t) and centered kerne matrix K (t) have the same nonzero eigenvaues, and the foowing equivaence reation hods. Σ (t) V (t) e = λ e V (t) e K (t) β (t) e = λ eβ (t) e (11) where V (t) e and β (t) e are eigenvectors of Σ(t) and K (t) respectivey. This property enabes us to compute eigenvectors from centered kerne matrix instead of from feature space covariance matrix as in Equation (8). With the first two properties, we do not need to compute the means and covariance matrixes in feature space, but do the eigen-decomposition of centered kerne matrix instead.

7 Kerne Trick Embedded Gaussian Mixture Mode 165 Feature space has a very high dimensionaity, which makes it intractabe to compute the compete Gaussian probabiity density G (φ(x j ) θ ) directy. However, Moghadam and Pentand s work [22] bring us some motivation. It divides the origina high dimension space into two parts: the principa subspace and the orthogona compement subspace. The principa subspace has dimension d φ. Thus the compete Gaussian density coud be approximated by Ĝ (φ(x j ) θ )= 1 (2π) d φ/2 d φ e=1 λ 1/2 e 1 (2πρ) (N d φ)/2 exp [ exp 1 2 [ d φ e=1 ε2 (x j ) 2ρ ] y 2 ] e λ e (12) where y e = φ(x j ) T V e (here V e is the e-th eigenvector of Σ ), ρ is the weight ratio, and ε 2 (x j ) is the residua reconstruction error. At the right side of the Equation (12), the first factor computes from the principa subspace, and the second factor computes from the orthogona compement subspace. The optima vaue of ρ can be determined by minimizing a cost function. From an information-theoretic point of view, the cost function shoud be the Kuback- Leiber divergence between the true density G (φ(x j ) θ ) and its approximation Ĝ (φ(x j ) θ ). J(ρ) = G(φ(x) θ ) og G(φ(x) θ ) dφ(x) (13) Ĝ(φ(x) θ ) Pugging (12) into upper equation, it can easiy shown that J(ρ) = 1 2 N e=d φ +1 [ λe ρ 1 + og ρ λ e ] Soving the equation J ρ = 0 yieds the optima vaue 1 ρ = N d φ N e=d φ +1 And according to Property 2, Σ and K has same nonzero eigenvaues, by empoying the property of symmetry matrix, we obtain λ e 1 [ dφ ] ρ = Σ 2 N d 1 [ dφ ] λ F e = K 2 φ N d λ F e φ e=1 e=1 (14) where F is a Frobenius matrix norm defined as A F = trace(aa T ).

8 166 J. Wang, J. Lee, and C. Zhang The residua reconstruction error, ε 2 (x j )= φ(x j ) 2 obtained by empoying kerne trick ε 2 (x j )=k(x j,x j ) And according to Lemma 3 in Appendix, d φ e=1 d φ e=1 y 2 e, coud be easiy y 2 e (15) y e = φ(x j ) T V e = ( V e φ(x j ) ) = β T Γ (16) where Γ is the j-th coumn of centered projecting kerne matrix K. It shoud notice here the centered kerne matrix K is used to obtain eigenvaues λ and eigenvectors β, whereas projecting kerne matrix K is used to compute y e as in Equation (16). And both these two matrixes cannot be omitted in the training procedure. Empoying a these resuts, we obtain the third property. [Property 3] The Gaussian probabiity density function G (φ(x j ) θ ) can be approximated by Ĝ(φ(x j ) θ ) as shown in (12), where ρ, ε 2 (x j ) and y e are shown in (14), (15) and (16) respectivey. We shoud speciay stress that the approximation of G (φ(x j ) θ ) by Ĝ (φ(x j ) θ ) is compete since it represents not ony the principe subspace but aso the orthogona compement subspace. According to these three properties, we can draw the foowing concusions. Mercer kerne trick is introduced to indirecty compute dot products in the high dimension feature space. The probabiity that a sampe beongs to the th component can be computed through centered kerne matrix and centered projecting kerne matrix instead of mean vector and covariance matrix as in traditiona GMM. Needing not to obtain fu eigen-decomposition of the centered kerne matrix, we coud approximate the Gaussian density with ony the argest d φ principa components of the centered kerne matrix, and the approximation is not dependent on d φ very much since it is compete and optima. With these three properties, we can formuate our kerne Gaussian Mixture Mode. 3.3 Kerne GMM and the Parameter Estimation Agorithm In feature space, the kerne matrixes repace the mean and covariance matrixes in input space to represent the Gaussian component. So the parameters of each component are not the mean vectors and covariance matrixes, but the kerne matrixes. In fact, it is aso intractabe to compute mean vectors and covariance matrixes in feature space because feature space has a quite high or even infinite

9 Kerne Trick Embedded Gaussian Mixture Mode 167 dimension. Fortunatey, with kerne trick embedded, computing on the kerne matrix is quite feasibe since the dimension of the principa subspace is bounded by the data size N. Consequenty, the parameters that kerne GMM needs to estimate are the prior probabiity α (t), centered kerne matrix K (t), centered projecting kerne (see Tabe 1). That is to say, the M-components kerne matrix ( K (t) ) and w (t) GMM is determined by parameters θ = ( α (t),w (t), K (t), ( K (t) ) ), =1,,M. According to the properties in previous sections, the EM agorithm for parameter estimation of kgmm coud be summarized as in Tabe 2. Assuming the number of Gaussian components is M, we initiaize the posterior probabiity p i that each sampe beongs to some Gaussian component. The agorithm coud not be terminated unti it converges or the presetting maximum iteration step is reached. Tabe 2. Parameter Estimation Agorithm for kgmm Step 0. Initiaize a p (0) i ( = 1,,M; i = 1,,N), t = 0, set t max and stopping condition fase. Step 1. Whie stopping condition is fase, t = t + 1, do Step 2-7. Step 2. Compute α (t), w (t) notations in Tabe 1. i, W(t),(W (t) ), K (t) and (K (t) ) according to Step 3. Compute the matrixes K (t),( K (t) ) via Property 1. Step 4. Compute the argest d φ eigenvaues and eigenvectors of centered kerne matrixes K (t). Step 5. Compute Ĝ(φ(x j) θ (t) ) via Property 3. Step 6. Compute a posterior probabiities p (t) i via (9) Step 7. Test the stopping condition. ( (t) p i If t>t max or k =1 i=1 otherwise oop back to Step 1. p (t 1) ) 2 i <ε, set stopping condition true, 3.4 Discussion of the Agorithm Computationa Cost and Speedup Techniques on Large Scae Probem. By empoying kerne trick, the computationa cost of kerne eigen-decomposition based methods is amost invoved by the eigen-decomposition step. Therefore, the computationa cost mainy depends on the size of kerne matrix, i.e. the size of data set. If the size N is not very arge (e.g. N 1, 000), it is not a probem to obtain fu eigen-decomposition. If the size N is arge enough, it is iabe to meet with the curse of dimension. As is known, if N>5, 000, it is impossibe to finish fu eigen-decomposition even within hours on the fastest PCs currenty. However, the size N is usuay very arge in some probems such as data mining. Fortunatey, as we have pointed out in Section 3.2, we need not obtain the fu eigen-decomposition for components of kgmm, and we ony need estimate

10 168 J. Wang, J. Lee, and C. Zhang the argest d φ nonzero eigenvaues and corresponding eigenvectors. As for arge scae probem, we coud make the assumption that d φ N. Some techniques can be adopted to estimate the argest d φ components for kerne methods. The first technique is based on traditiona Orthogona Iteration or Lanzcos Iteration. The second is to make the kerne matrix sparse by samping techniques [1]. The third is to appy Nyström method to speedup kerne machine [19]. However, these three techniques are a itte compicated. In this paper, we adopt another much simpe but practica technique proposed by Tayor et.a [20]. It assumes that a sampes forming kerne matrix of each component are drawn from an underying density p(x). And the probem coud be written down as a continue eigen-probem. k(x, y)p(x)β i (x)dx = λ i β i (y) (17) where λ i, β i (y) are eigenvaue and eigenvector, and k is a given kerne function. The integra coud be approximate using Monte Caro method by a subset of sampes {x i } m i=1 (m N,m d φ) drawn according to p(x). k(x, y)p(x)v i (x)dx 1 N k(x j,y)v i (x j ) (18) j=1 Pugging in y = x k for j =1,,N, we obtain a matrix eigen-probem. 1 N k(x j,x k )V i (x j )=ˆλ i V i (x k ) (19) j=1 where ˆλ i is the approximation of eigenvaue λ i. This approximation approach has been proved feasibe and has bounded error. We appy it to our parameter estimation agorithm on arge scae probem (N > 1, 000). In our agorithm, the underying density of component is approximated by Ĝ(φ(x) θ (t) ). We do samping to obtain a subset with size m (t) according to Ĝ(φ(x) θ ), and perform fu eigen-decomposition on such subset to obtain the argest d φ eigen-components. With empoying this Monte Caro samping technique, the computationa cost upon arge scae probem coud be reduced greaty. Furthermore, the memory needed by the parameter estimation agorithm aso reduces greaty upon arge scae probem. These makes the proposed kgmm efficient and practica. Comparison with Reated Works. There are sti some other work reated to ours. One major is the spectra custering agorithm [21]. Spectra custering coud be regarded as using RBF based kerne method to extract features and then performing custering by K-means. Compared with spectra custering, the proposed kgmm has at east two advantages. (1)kGMM can provide resut in probabiistic framework and can incorporate prior information easiy. (2) kgmm can be used in supervised earning probem as a density estimation method. A these advantages encourage us to appy the proposed kgmm.

11 Kerne Trick Embedded Gaussian Mixture Mode 169 Misunderstanding. We emphasize a misunderstanding of the proposed mode. Someone doubt that it can simpy run GMM in the reduced dimension space obtained by Kerne PCA to achieve the same resut as kgmm. They say that this just need project the data into the first d φ dimension of the feature space by Kerne PCA, and then perform GMM parameter estimation in that principe subspace. However, the choice of a proper d φ is a critica probem so that the performance wi competey depend on the choice of d φ.ifd φ is too arge, the estimation is not feasibe because probabiity estimation demands that the number of sampes is arge enough in comparison with dimension d φ, and the computationa cost wi increase greaty simutaneousy. On the contrary, sma d φ makes the estimated parameters not we represent the data. The proposed kgmm does not have that probem since the approximated density function is compete and optima under the minimum Kuback-Leiber divergence criteria. Moreover, kgmm can aow different component with different d φ. A these improve the fexibiity and expand the appication of the proposed kgmm. 4 Experiments In this section, two experiments are performed to vaidate the proposed kgmm compared with traditiona GMM. Firsty kgmm is empoyed as unsupervised earning or custering method on synthetic data set. Secondy kgmm is empoyed as supervised density estimation method for rea-word handwritten digit recognition. 4.1 Synthetic Data Custering Using Kerne GMM To provide an intuitive comparison between the proposed kgmm and traditiona GMM, we first conduct experiments on synthetic 2-D data sets. The data sets each with 1,000 sampes are depicted in Figure 1 and Figure 2. For traditiona GMM, a the sampes are used to estimate the parameters of two components mixture of Gaussian. When the agorithm stops, each sampe wi beong to one of the components or custers according to its posterior. The custering resuts of traditiona GMM are shown in Figure 1(a) and Figure 2(a). The resuts are obviousy not satisfying. However, by using kgmm with a poynomia kerne of degree 2, d φ = 4 for each Gaussian component and the same custering scheme as traditiona GMM, we achieve the promising resuts as shown in Figure 1(b) and Figure 2(b). Besides, kgmm provides probabiistic information as in Figure 1(c) and Figure 2(c), which cannot provide by most cassica kerne methods. 4.2 USPS Data-Set Recognition Using Kerne GMM Kerne GMM is aso appied to a rea-word probem, the US Posta Service (USPS) handwritten digit recognition. The data set consists of 9,226 grayscae

12 170 J. Wang, J. Lee, and C. Zhang (a) (b) (c) Fig. 2. Data set consists of 1,000 sampes. Points marked by beong to one custer and marked by beong to the other. (a) is the partition resut by traditiona GMM; (b) is the resut achieved by kgmm; (c) shows the probabiity that each point beongs to the eft-right custer. The whiter the point is, the higher the probabiity is. images of size 16x16, divided into a training set of 7,219 images and a test set of 2,007 images. The origina input data is just vector form of the digit image, i.e., the input feature space is with dimensionaity 256. Optionay, we can perform a inear discriminant anaysis (LDA) to reduce the dimensionaity of feature space. If LDA is performed, the feature space yieds to be 39. Each category ω is estimated a density of p(x ω) using 4 components GMM on training set. To cassify an test sampe x, we use the Bayesian decision rue { } ω = arg max p(x ω)p (ω),ω=1,, 10 (20) ω where P (ω) is prior probabiity of category ω. In this experiment, we set P (ω) = 1/10. That is to say, a categories are with equa prior probabiity. To be comparison, kgmm adopts the same experiment scheme as traditiona GMM except for using an RBF kerne function k(x, x ) = exp( γ x x 2 ) with γ =0.0015, Gaussian mixture component number of 2 and d φ = 40 for each Gaussian component. The experiment resuts of GMM in origina input space, GMM in the space by LDA (from [4]) and kgmm are shown in Tabe 3. We can see that kgmm, with ess components number, has obviousy much better performance than traditiona GMM with or without LDA. Athough, the resut by kgmm is not the state-of-art resut on USPS, we sti can improve the resut by incorporating invariance prior knowedge using Tangent distance as in [4].

13 Kerne Trick Embedded Gaussian Mixture Mode 171 Tabe 3. Comparison resuts on USPS data set Method Best Error rate GMM 8.0% LDA+GMM 6.7% Kerne GMM 4.3% 5 Concusion In this paper, we present a kerne Gaussian Mixture Mode, and deduce a parameter estimation agorithm by embedding kerne trick into EM agorithm. Furthermore, we adopt a Monte Caro samping technique to speedup kgmm upon arge scae probem, thus make it more practica and efficient. Compared with most cassica kerne methods, kgmm can sove probems in a probabiistic framework. Moreover, it can tacke noninear probems better than the traditiona GMM. Experimenta resuts on synthetic and rea-word data set show that the proposed approach has satisfied performance. Our future work wi focus on incorporating prior knowedge such as invariance in kgmm and enriching its appications. Acknowedgements. The author woud ike to thank anonymous reviewers for their hepfu comments, aso thank Jason Xu for hepfu conversations about this work. References 1. Achioptas, D., McSherry, F. and Schökopf, B.: Samping techniques for kerne methods. In Advances in Neura Information Processing System (NIPS) 14, MIT Press, Cambridge MA (2002) 2. Bimes, J. A.: A Gente Tutoria on the EM Agorithm and its Appication to Parameter Estimation for Gaussian Mixture and Hidden Markov Modes, Technica Report, UC Berkeey, ICSI-TR (1997) 3. Bishop, C. M.: Neura Networks for Pattern Recognition, Oxford University Press. (1995) 4. Dahmen, J., Keysers, D., Ney, H. and Güd, M.O.: Statistica Image Object Recognition using Mixture Densities. Journa of Mathematica Imaging and Vision, 14(3) (2001) Duda, R. O., Hart, P. E. and Stork, D. G.: Pattern Cassification, New York: John Wiey & Sons Press, 2nd Edition. (2001) 6. Everitt, B. S.: An Introduction to Latent Variabe Modes, London: Chapman and Ha. (1984) 7. Francis R. B. and Michae I. J.: Kerne Independent Component Anaysis, Journa of Machine Learning Research, 3, (2002) Geste, T. V., Suykens, J.A.K., Lanckriet, G., Lambrechts, A., Moor, B. De and Vanderwae J.: Bayesian framework for east squares support vector machine cassifiers, gaussian processs and kerne fisher discriminant anaysis. Neura Computation, 15(5) (2002)

14 172 J. Wang, J. Lee, and C. Zhang 9. Herbrich, R., Graepe, T. and Campbe, C.: Bayes Point Machines: Estimating the Bayes Point in Kerne Space. In Proceedings of Internationa Joint Conference on Artificia Inteigence Work-shop on Support Vector Machines, (1999) Kwok, J. T.: The Evidence Framework Appied to Support Vector Machines, IEEE Trans. on NN, Vo. 11 (2000) Mika, S., Rätsch, G., Weston, J., Schökopf, B. and Müer, K.R.: Fisher discriminant anaysis with kernes. IEEE Workshop on Neura Networks for Signa Processing IX, (1999) Mjosness, E. and Decoste, D.: Machine Learning for Science: State of the Art and Future Pros-pects, Science. Vo. 293 (2001) 13. Roberts, S. J.: Parametric and Non-Parametric Unsupervised Custer Anaysis, Pattern Recogni-tion, Vo. 30. No 2, (1997) Schökopf, B., Smoa, A.J. and Müer, K.R.: Noninear Component Anaysis as a Kerne Eigen-vaue Probem, Neura Computation, 10(5), (1998) Schökopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müer, K. R., Raetsch, G. and Smoa, A.: Input Space vs. Feature Space in Kerne-Based Methods, IEEE Trans. on NN, Vo 10. No. 5, (1999) Schökopf, B. and Smoa, A. J.: Learning with Kernes: Support Vector Machines, Reguarization and Beyond, MIT Press, Cambridge MA (2002) 17. Tipping, M. E.: Sparse Bayesian Learning and the Reevance Vector Machine, Journa of Machine Learning Research. (2001) 18. Vapnik, V.: The Nature of Statistica Learning Theory, 2nd Edition, Springer- Verag, New York (1997) 19. Wiiams, C. and Seeger, M.: Using the Nyström Method to Speed Up Kerne Machines. In T. K. Leen, T. G. Diettrich, and V. Tresp, editors, Advances in Neura Information Processing Systems (NIPS)13. MIT Press, Cambridge MA (2001) 20. Tayor, J. S., Wiiams, C., Cristianini, N. and Kandoa J.: On the Eigenspectrum of the Gram Matrix and Its Reationship to the Operator Eigenspectrum, N. CesaBianchi et a. (Eds.): ALT 2002, LNAI 2533, Springer-Verag, Berin Heideberg (2002) Ng, A. Y., Jordan, M. I. and Weiss, Y.: On Spectra Custering: Anaysis and an agorithm, Advance in Neura Information Processing Systems (NIPS) 14, MIT Press, Cambridge MA (2002) 22. Moghaddam, B. and Pentand, A.: Probabiistic visua earning for object representation, IEEE Trans. on PAMI, Vo. 19, No. 7 (1997) Appendix To be convenient, the subscripts representing Gaussian components and the superscripts representing iterations are omitted in this part. [Lemma 1] Suppose φ( ) is a mapping function that satisfies Mercer conditions as in Equation (1)(Section 2) with N training sampes X = {x i } N i=1. w is a N-dimensiona coumn vector with w =[w 1,,w N ] T R N. Let s define µ = i=1 φ(x i)wi 2 and φ(x i )=φ(x i ) µ Then the foowing consequences hod true: (1) If K is a N N kerne matrix such that K ij = ( w i φ(x i ) w j φ(x j ) ), and K is a N N matrix, which is centered in the feature space, such that

15 Kerne Trick Embedded Gaussian Mixture Mode 173 K ij = ( w i φ(xi ) w j φ(xj ) ), then we can get: K = K WK KW + WKW (a1) where W = ww T. (2) If K is a N N projecting kerne matrix such that K ij = ( φ(x i ) w j φ(x j ) ), and K is a N N matrix, which is centered in the feature space, such that K ij = ( φ(xi ) w j φ(xj ) ), then K = K W K K W + W KW (a2) where W = 1 N w T, 1 N is a N-dimensiona coumn vector that a entries equa to 1. Proof: (1) K ij = ( w i φ(xi) w j φ(xj) ) = w i φ(xi) T w j φ(xj) = ( N w i(φ(x i) φ(x k )wk) 2 ) T ( N wj(φ(x j) φ(x k )wk) 2 ) k=1 k=1 = ( w iφ(x i) T w ) ( jφ(x j) w i w k wk φ(x k ) T w ) jφ(x j) k=1 N ( w k wiφ(x i) T w k φ(x k ) ) w j + w i k=1 k=1 N k=1 n=1 N = K ij w i w k K kj w k K ik w j + w i k=1 N ( w k w n wk φ(x k ) T w ) nφ(x n) N k=1 n=1 then we can get the more compact expression as (a1). Simiary, we can prove (2). [Lemma 2] Suppose Σ is a covariance matrix such that N ( w k w n wk φ(x k ) T w ) nφ(x n) w j Σ = i=1 φ(x i ) φ(x i ) T w 2 i (a3) Then the foowing woud be hod (1) ΣV = λv Kβ = λβ. (2) V = i=1 β φ(x i i )w i. Proof: (1) Firsty, we prove. If ΣV = λv, then the soution V ies in space spanned by w 1 φ(x1 ),,w N φ(xn ), and we have two usefu consequences: firsty, we can consider the equivaent equation λ ( w k φ(xk ) V ) = ( w k φ(xk ) ΣV ), for a k =1,,N (a4)

16 174 J. Wang, J. Lee, and C. Zhang and secondy, there exist coefficients β i (i =1,,N) such that V = β iw i φ(xi ) (a5) i=1 Combining (a4) and (a5), we get N ( λ β i wk φ(xk ) w i φ(xi ) ) N N = β i (w k φ(xk ) w j φ(xj ) ( w j φ(xj ) w i φ(xi ) )) i=1 i=1 j=1 then this can read λ Kβ = K 2 β where β =[β 1,,β N ] T. K is a symmetric matrix, which has a set of eigenvectors which span the whoe space, thus λβ = Kβ (a6) Simiary, we can prove. (2) is easy to prove, so the proof is omitted. [Lemma 3] If x R d is a sampe, with φ(x) =φ(x) µ, then ( V φ(xj ) ) = i=1 β i where Γ is j-th coumn of centered projecting matrix K. Proof: This Lemma foows from (a5). ( wi φ(xi ) φ(x j ) ) = β T Γ (a7)

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1 Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rues 1 R.J. Marks II, S. Oh, P. Arabshahi Λ, T.P. Caude, J.J. Choi, B.G. Song Λ Λ Dept. of Eectrica Engineering Boeing Computer Services University

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Research Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation

Research Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation Appied Mathematics and Stochastic Anaysis Voume 007, Artice ID 74191, 8 pages doi:10.1155/007/74191 Research Artice On the Lower Bound for the Number of Rea Roots of a Random Agebraic Equation Takashi

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

A Robust Voice Activity Detection based on Noise Eigenspace Projection

A Robust Voice Activity Detection based on Noise Eigenspace Projection A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Math 124B January 31, 2012

Math 124B January 31, 2012 Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

An Information Geometrical View of Stationary Subspace Analysis

An Information Geometrical View of Stationary Subspace Analysis An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin

More information

David Eigen. MA112 Final Paper. May 10, 2002

David Eigen. MA112 Final Paper. May 10, 2002 David Eigen MA112 Fina Paper May 1, 22 The Schrodinger equation describes the position of an eectron as a wave. The wave function Ψ(t, x is interpreted as a probabiity density for the position of the eectron.

More information

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value ISSN 1746-7659, Engand, UK Journa of Information and Computing Science Vo. 1, No. 4, 017, pp.91-303 Converting Z-number to Fuzzy Number using Fuzzy Expected Vaue Mahdieh Akhbari * Department of Industria

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Efficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression

Efficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression Machine Learning manuscript No. (wi be inserted by the editor) Efficient Approximate Leave-One-Out Cross-Vaidation for Kerne Logistic Regression Gavin C. Cawey, Nicoa L. C. Tabot Schoo of Computing Sciences

More information

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection FFTs in Graphics and Vision Spherica Convoution and Axia Symmetry Detection Outine Math Review Symmetry Genera Convoution Spherica Convoution Axia Symmetry Detection Math Review Symmetry: Given a unitary

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Intuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games

Intuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games Journa of Uncertain Systems Vo.5, No.4, pp.27-285, 20 Onine at: www.jus.org.u Intuitionistic Fuzzy Optimization Technique for Nash Equiibrium Soution of Muti-objective Bi-Matri Games Prasun Kumar Naya,,

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

A Ridgelet Kernel Regression Model using Genetic Algorithm

A Ridgelet Kernel Regression Model using Genetic Algorithm A Ridgeet Kerne Regression Mode using Genetic Agorithm Shuyuan Yang, Min Wang, Licheng Jiao * Institute of Inteigence Information Processing, Department of Eectrica Engineering Xidian University Xi an,

More information

Wavelet Galerkin Solution for Boundary Value Problems

Wavelet Galerkin Solution for Boundary Value Problems Internationa Journa of Engineering Research and Deveopment e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Voume, Issue 5 (May 24), PP.2-3 Waveet Gaerkin Soution for Boundary Vaue Probems D. Pate, M.K.

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

International Journal "Information Technologies & Knowledge" Vol.5, Number 1,

International Journal Information Technologies & Knowledge Vol.5, Number 1, Internationa Journa "Information Tecnoogies & Knowedge" Vo.5, Number, 0 5 EVOLVING CASCADE NEURAL NETWORK BASED ON MULTIDIMESNIONAL EPANECHNIKOV S KERNELS AND ITS LEARNING ALGORITHM Yevgeniy Bodyanskiy,

More information

Supervised i-vector Modeling - Theory and Applications

Supervised i-vector Modeling - Theory and Applications Supervised i-vector Modeing - Theory and Appications Shreyas Ramoji, Sriram Ganapathy Learning and Extraction of Acoustic Patterns LEAP) Lab, Eectrica Engineering, Indian Institute of Science, Bengauru,

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch The Eighth Internationa Symposium on Operations Research and Its Appications (ISORA 09) Zhangiaie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 402 408 Minimizing Tota Weighted Competion

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

Mode in Output Participation Factors for Linear Systems

Mode in Output Participation Factors for Linear Systems 2010 American ontro onference Marriott Waterfront, Batimore, MD, USA June 30-Juy 02, 2010 WeB05.5 Mode in Output Participation Factors for Linear Systems Li Sheng, yad H. Abed, Munther A. Hassouneh, Huizhong

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

Introduction to Simulation - Lecture 13. Convergence of Multistep Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Introduction to Simulation - Lecture 13. Convergence of Multistep Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Introduction to Simuation - Lecture 13 Convergence of Mutistep Methods Jacob White Thans to Deepa Ramaswamy, Micha Rewiensi, and Karen Veroy Outine Sma Timestep issues for Mutistep Methods Loca truncation

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

AALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009

AALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009 AALBORG UNIVERSITY The distribution of communication cost for a mobie service scenario by Jesper Møer and Man Lung Yiu R-29-11 June 29 Department of Mathematica Sciences Aaborg University Fredrik Bajers

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Fast Spectral Clustering via the Nyström Method

Fast Spectral Clustering via the Nyström Method Fast Spectra Custering via the Nyström Method Anna Choromanska, Tony Jebara, Hyungtae Kim, Mahesh Mohan 3, and Caire Monteeoni 3 Department of Eectrica Engineering, Coumbia University, NY, USA Department

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Error Bounds on the SCISSORS Approximation Method

Error Bounds on the SCISSORS Approximation Method pubs.acs.org/jcim Error Bounds on the SCISSORS Approximation Method Imran S. Haque and Vijay S. Pande*,, Department of Computer Science and Department of Chemistry, Stanford University, Stanford, Caifornia,

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Least-squares Independent Component Analysis

Least-squares Independent Component Analysis Neura Computation, vo.23, no., pp.284 30, 20. Least-squares Independent Component Anaysis Taiji Suzuki Department of Mathematica Informatics, The University of Tokyo 7-3- Hongo, Bunkyo-ku, Tokyo 3-8656,

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Math 124B January 17, 2012

Math 124B January 17, 2012 Math 124B January 17, 212 Viktor Grigoryan 3 Fu Fourier series We saw in previous ectures how the Dirichet and Neumann boundary conditions ead to respectivey sine and cosine Fourier series of the initia

More information

The Symmetric and Antipersymmetric Solutions of the Matrix Equation A 1 X 1 B 1 + A 2 X 2 B A l X l B l = C and Its Optimal Approximation

The Symmetric and Antipersymmetric Solutions of the Matrix Equation A 1 X 1 B 1 + A 2 X 2 B A l X l B l = C and Its Optimal Approximation The Symmetric Antipersymmetric Soutions of the Matrix Equation A 1 X 1 B 1 + A 2 X 2 B 2 + + A X B C Its Optima Approximation Ying Zhang Member IAENG Abstract A matrix A (a ij) R n n is said to be symmetric

More information