Kernel Trick Embedded Gaussian Mixture Model
|
|
- Angelica Norman
- 6 years ago
- Views:
Transcription
1 Kerne Trick Embedded Gaussian Mixture Mode Jingdong Wang, Jianguo Lee, and Changshui Zhang State Key Laboratory of Inteigent Technoogy and Systems Department of Automation, Tsinghua University Beijing, , P. R. China Abstract. In this paper, we present a kerne trick embedded Gaussian Mixture Mode (GMM), caed kerne GMM. The basic idea is to embed kerne trick into EM agorithm and deduce a parameter estimation agorithm for GMM in feature space. Kerne GMM coud be viewed as a Bayesian Kerne Method. Compared with most cassica kerne methods, the proposed method can sove probems in probabiistic framework. Moreover, it can tacke noninear probems better than the traditiona GMM. To avoid great computationa cost probem existing in most kerne methods upon arge scae data set, we aso empoy a Monte Caro samping technique to speed up kerne GMM so that it is more practica and efficient. Experimenta resuts on synthetic and rea-word data set demonstrate that the proposed approach has satisfing performance. 1 Introduction Kerne trick is an efficient method for noninear data anaysis eary used by Support Vector Machine (SVM) [18]. It has been pointed out that kerne trick coud be used to deveop noninear generaization of any agorithm that coud be cast in the term of dot products. In recent years, kerne trick has been successfuy introduced into various machine earning agorithms, such as Kerne Principa Component Anaysis (Kerne PCA) [14,15], Kerne Fisher Discriminant (KFD) [11], Kerne Independent Component Anaysis (Kerne ICA) [7] and so on. However, in many cases, we are required to obtain risk minimization resut and incorporate prior knowedge, which coud be easiy provided within Bayesian probabiistic framework. This makes the emerging of combining kerne trick and Bayesian method, which is caed Bayesian Kerne Method [16]. As Bayesian Kerne Method is in probabiistic framework, it can reaize Bayesian optima decision and estimate confidence or reiabiity easiy with probabiistic criteria such as Maximum-A-Posterior [5] and so on. Recenty some researches have been done in this fied. Kwok combined the evidence framework with SVM [10], Geste et a. [8] incorporated Bayesian framework with SVM and KFD. These two work are both to appy Bayesian framework to known kerne method. On the other hand, some researchers proposed R. Gavadà et a. (Eds.): ALT 2003, LNAI 2842, pp , c Springer-Verag Berin Heideberg 2003
2 160 J. Wang, J. Lee, and C. Zhang new Bayesian methods with kerne trick embedded, among which one of the most infuentia work is the Reevance Vector Machine (RVM) proposed by Tipping [17]. This paper aso addresses the probem of Bayesian Kerne Method. The proposed method is that we embed kerne trick into Expectation-Maximization (EM) agorithm [3], and deduce a new parameter estimation agorithm for Gaussian Mixture Mode (GMM) in the feature space. The entire mode is caed kerne Gaussian Mixture Mode (kgmm). The rest of this paper is organized as foows. Section 2 reviews some background knowedge, and Section 3 describes the kerne Gaussian Mixture Mode and the corresponding parameter estimation agorithm. Experiments and resuts are presented in Section 4. Concusions are drawn in the fina section. 2 Preiminaries In this section, we review some background knowedge incuding the kerne trick, GMM based on EM agorithm and Bayesian Kerne Method. 2.1 Kerne Trick Mercer kerne trick was eary appied by SVM. The idea is that we can impicity map input data into a high dimension feature space via a noninear function: Φ : X H x φ(x) (1) And a simiarity measure is defined from the dot product in space H as foows: k (x, x ) ( φ(x) φ(x ) ) (2) where the kerne function k shoud satisfy Mercer s condition [18]. Then it aows us to dea with earning agorithms using inear agebra and anaytic geometry. Generay speaking, on the one hand kerne trick coud dea with data in the high-dimensiona dot product space H, which is named feature space by a map associated with k. On the other hand, it avoids expensive computation cost in feature space by empoying the kerne function k instead of directy computing dot product in H. Being an eegant way for noninear anaysis, kerne trick has been used in many other agorithms such as Kerne Fisher Discriminant [11], Kerne PCA [14, 15], Kerne ICA [7] and so on. 2.2 GMM Based on EM Agorithm GMM is a kind of mixture density modes, which assumes that each component of the probabiistic mode is a Gaussian density. That is to say:
3 Kerne Trick Embedded Gaussian Mixture Mode 161 p(x Θ) = M α ig i (x θ i ) (3) i=1 where x R d is a random variabe, parameters Θ = ( ) α 1,,α M ; θ 1, θ M satisfy M i=1 α i = 1, α i 0 and G i (x θ i ) is a Gaussian probabiity density function: 1 { G (x θ )= exp 1 } (2π) d/2 1/2 Σ 2 (x µ ) T Σ 1 (x µ ) (4) where θ =(µ,σ ). GMM coud be viewed as a generative mode [12] or a atent variabe mode [6] that assumes data set X = {x i } N i=1 are generated by M Gaussian components, and introduces the atent variabes item Z = {z i } N i=1 whose vaue indicates which component generates the data. That is to say, we assume that if sampe x i is generated by the th component and then z i =. Then the parameters of GMM coud be estimated by the EM agorithm [2]. EM agorithm for GMM is an iterative procedure, which estimates the new parameters in terms of the od parameters as the foowing updating formuas: α (t) = 1 N i=1 p( x i,θ (t 1) ) µ (t) i=1 = x ip( x i,θ (t 1) ) i=1 p( x i,θ (t 1) ) Σ (t) = i=1 (x i µ (t) i=1 p( x i,θ (t 1) ) )(x i µ (t) ) T p( x i,θ (t 1) ) (5) where represents the th Gaussian component, and p( x i,θ (t 1) )= α (t 1) M j=1 α(t 1) j G(x i θ (t 1) ) G(x i θ (t 1) j ),=1,,M (6) Θ (t 1) = ( α (t 1) 1,,α (t 1) M ; θ(t 1) 1,,θ (t 1) ) M are parameters of the (t 1) th iteration and Θ (t) = ( α (t) 1,,α(t) M ; ) θ(t) 1,,θ(t) M are parameters of the (t) th iteration. GMM has been successfuy appied in many fieds, such as parametric custering, density estimation and so on. However, for instance, it can t give a simpe but satisfied custering resut on data set with compex structure [13] as shown in Figure 1. One aternative way is to perform GMM based custering in another space instead of in origina data space.
4 162 J. Wang, J. Lee, and C. Zhang (a) (b) (c) Fig. 1. Data set of two concentric circes with 1,000 sampes, points marked by beong to one custer and marked by beong to the other. (a) is the partition resut by traditiona GMM, (b) is the resut achieved by kgmm using poynomia kerne of degree 2; (c) shows the probabiity that each point beongs to the outer circe. The whiter the point is, the higher the probabiity is. 2.3 Bayesian Kerne Method Bayesian Kerne Method coud be viewed as a combination of Bayesian method and Kerne method. It inherits merits from both these two methods. It coud tacke probems nonineary ike kerne method, and it obtains estimation resuts within a probabiistic framework ike cassica Bayesian methods. Many works have been done in this fied. There are typicay three different ways. Interpretation of kerne methods in Bayesian framework such as SVM and other kerne methods in Bayesian framework as in [10,8]; Empoying kerne methods in traditiona Bayesian methods such as Gaussian Processes and Lapacian Processes [16]; Proposing new methods in Bayesian framework with kerne trick embedded such as Reevance Vector Machine (RVM) [17] and Bayes Point Machine [9]. And we intend to embed kerne trick into Gaussian Mixture Mode. This work just beongs to the second category of Bayesian Kerne Method. 3 Kerne Trick Embedded GMM As mentioned before, Gaussian Mixture Mode can not obtain simpe but satisfied resuts on data sets with compex structure, so we consider empoying kerne trick to reaize a Bayesian Kerne version of GMM. Our basic idea is to embed kerne trick into parameter estimation procedure of GMM. In this section, we firsty describe GMM in feature space, secondy present the properties in feature space, then formuate the Kerne Gaussian Mixture Mode and the corresponding parameter estimation agorithm, finay make some discussions on the agorithm.
5 Kerne Trick Embedded Gaussian Mixture Mode GMM in Feature Space GMM in feature space by a map φ( ) associated with kerne function k can be easiy rewritten as p(φ(x) Θ) = M α ig(φ(x) θ i ) (7) i=1 and the EM updating formua in (5) and (6) can be repaced by the foowing. α (t) = 1 N i=1 p( φ(x i),θ (t 1) ) µ (t) i=1 = φ(x i)p( φ(x i ),Θ (t 1) ) i=1 p( φ(x i),θ (t 1) ) Σ (t) = i=1 (φ(x i) µ (t) i=1 p( φ(x i),θ (t 1) ) )(φ(x i ) µ (t) ) T p( φ(x i ),Θ (t 1) ) (8) where represents the th Gaussian component, and p( φ(x i ),Θ (t 1) )= α (t 1) M j=1 α(t 1) j G(φ(x i ) θ (t 1) ) G(φ(x i ) θ (t 1) j ), =1,,M (9) However, computing GMM directy with formua (8) and (9) in a high dimension feature space is computationay expensive thus impractica. We consider empoying kerne trick to overcome this difficuty. In the foowing section, we wi give some properties based on Mercer kerne trick to estimate the GMM parameters in feature space. 3.2 Properties in Feature Space To be convenient, notations in feature space are given firsty, and then three properties are presented. Notations. In a the formuas, bod and capita etters are for matrixes, itaic and bod etters are for vectors, and itaic and ower case are for scaars. Subscript represents the th Gaussian component, superscript t represents the t th iteration of the EM procedure. A T represents transpose of matrix A. Other notations are shown in Tabe 1.
6 164 J. Wang, J. Lee, and C. Zhang p (t) Tabe 1. Notation List i = p( φ(x i),θ (t) ) Posterior that φ(x i) beongs to the th component. (p w (t) (t)/ i = M ) i j=1 p(t) ji (w (t) i )2 represents ratio that φ(x i) is occupied by the th Gaussian component. µ (t) φ (t) Σ (t) = i=1 φ(xi)(w(t) i )2 (x i)=φ(x i) µ (t) = i=1 φ(x i) φ(x i) T (w (t) i )2 Mean vector of the th Gaussian component. Centered image of φ(x i). Covariance matrix of the th Gaussian component. (K (t) ) ij = ( (w i φ(x i) w j φ(x ) j) Kerne matrix ( K (t) ) ij = ( (w i φ(xi) w j φ(xj) ) Centered kerne matrix. (K (t) ) ij = ( (φ(x i) w j φ(x ) j) Projecting kerne matrix. ( K (t) ) ij = ( φ(xi) w j φ(xj) ) Centered projecting kerne matrix. λ (t) e, V (t) e λ (t) e, β(t) e Eigenvaue and eigenvector of Σ (t). Eigenvaue and eigenvector of K (t). Properties in Feature Space. According to Lemma 1 in Appendix, we can get the first property. [Property 1] Centered kerne matrix K (t) and centered projecting kerne matrix ( K (t) ) are computed from the foowing formuas: ( K (t) K (t) ) =(K (t) = K (t) W (t) K (t) ) (W (t) ) K (t) K (t) W (t) (K (t) ) W (t) + W (t) +(W (t) K (t) W (t) ) K (t) W (t) (10) where W (t) = w (t) (w (t) ) T,(W (t) ) =1 N (w (t) ) T, w (t) =[w (t) 1,,w(t) N ]T, and 1 N is a N-dimensiona coumn vector with a entries equa to 1. This property presents the way to center the kerne matrix and the projecting kerne matrix. According to Lemma 2 in Appendix and Property 1, we can obtain the second property. [Property 2] Feature space covariance matrix Σ (t) and centered kerne matrix K (t) have the same nonzero eigenvaues, and the foowing equivaence reation hods. Σ (t) V (t) e = λ e V (t) e K (t) β (t) e = λ eβ (t) e (11) where V (t) e and β (t) e are eigenvectors of Σ(t) and K (t) respectivey. This property enabes us to compute eigenvectors from centered kerne matrix instead of from feature space covariance matrix as in Equation (8). With the first two properties, we do not need to compute the means and covariance matrixes in feature space, but do the eigen-decomposition of centered kerne matrix instead.
7 Kerne Trick Embedded Gaussian Mixture Mode 165 Feature space has a very high dimensionaity, which makes it intractabe to compute the compete Gaussian probabiity density G (φ(x j ) θ ) directy. However, Moghadam and Pentand s work [22] bring us some motivation. It divides the origina high dimension space into two parts: the principa subspace and the orthogona compement subspace. The principa subspace has dimension d φ. Thus the compete Gaussian density coud be approximated by Ĝ (φ(x j ) θ )= 1 (2π) d φ/2 d φ e=1 λ 1/2 e 1 (2πρ) (N d φ)/2 exp [ exp 1 2 [ d φ e=1 ε2 (x j ) 2ρ ] y 2 ] e λ e (12) where y e = φ(x j ) T V e (here V e is the e-th eigenvector of Σ ), ρ is the weight ratio, and ε 2 (x j ) is the residua reconstruction error. At the right side of the Equation (12), the first factor computes from the principa subspace, and the second factor computes from the orthogona compement subspace. The optima vaue of ρ can be determined by minimizing a cost function. From an information-theoretic point of view, the cost function shoud be the Kuback- Leiber divergence between the true density G (φ(x j ) θ ) and its approximation Ĝ (φ(x j ) θ ). J(ρ) = G(φ(x) θ ) og G(φ(x) θ ) dφ(x) (13) Ĝ(φ(x) θ ) Pugging (12) into upper equation, it can easiy shown that J(ρ) = 1 2 N e=d φ +1 [ λe ρ 1 + og ρ λ e ] Soving the equation J ρ = 0 yieds the optima vaue 1 ρ = N d φ N e=d φ +1 And according to Property 2, Σ and K has same nonzero eigenvaues, by empoying the property of symmetry matrix, we obtain λ e 1 [ dφ ] ρ = Σ 2 N d 1 [ dφ ] λ F e = K 2 φ N d λ F e φ e=1 e=1 (14) where F is a Frobenius matrix norm defined as A F = trace(aa T ).
8 166 J. Wang, J. Lee, and C. Zhang The residua reconstruction error, ε 2 (x j )= φ(x j ) 2 obtained by empoying kerne trick ε 2 (x j )=k(x j,x j ) And according to Lemma 3 in Appendix, d φ e=1 d φ e=1 y 2 e, coud be easiy y 2 e (15) y e = φ(x j ) T V e = ( V e φ(x j ) ) = β T Γ (16) where Γ is the j-th coumn of centered projecting kerne matrix K. It shoud notice here the centered kerne matrix K is used to obtain eigenvaues λ and eigenvectors β, whereas projecting kerne matrix K is used to compute y e as in Equation (16). And both these two matrixes cannot be omitted in the training procedure. Empoying a these resuts, we obtain the third property. [Property 3] The Gaussian probabiity density function G (φ(x j ) θ ) can be approximated by Ĝ(φ(x j ) θ ) as shown in (12), where ρ, ε 2 (x j ) and y e are shown in (14), (15) and (16) respectivey. We shoud speciay stress that the approximation of G (φ(x j ) θ ) by Ĝ (φ(x j ) θ ) is compete since it represents not ony the principe subspace but aso the orthogona compement subspace. According to these three properties, we can draw the foowing concusions. Mercer kerne trick is introduced to indirecty compute dot products in the high dimension feature space. The probabiity that a sampe beongs to the th component can be computed through centered kerne matrix and centered projecting kerne matrix instead of mean vector and covariance matrix as in traditiona GMM. Needing not to obtain fu eigen-decomposition of the centered kerne matrix, we coud approximate the Gaussian density with ony the argest d φ principa components of the centered kerne matrix, and the approximation is not dependent on d φ very much since it is compete and optima. With these three properties, we can formuate our kerne Gaussian Mixture Mode. 3.3 Kerne GMM and the Parameter Estimation Agorithm In feature space, the kerne matrixes repace the mean and covariance matrixes in input space to represent the Gaussian component. So the parameters of each component are not the mean vectors and covariance matrixes, but the kerne matrixes. In fact, it is aso intractabe to compute mean vectors and covariance matrixes in feature space because feature space has a quite high or even infinite
9 Kerne Trick Embedded Gaussian Mixture Mode 167 dimension. Fortunatey, with kerne trick embedded, computing on the kerne matrix is quite feasibe since the dimension of the principa subspace is bounded by the data size N. Consequenty, the parameters that kerne GMM needs to estimate are the prior probabiity α (t), centered kerne matrix K (t), centered projecting kerne (see Tabe 1). That is to say, the M-components kerne matrix ( K (t) ) and w (t) GMM is determined by parameters θ = ( α (t),w (t), K (t), ( K (t) ) ), =1,,M. According to the properties in previous sections, the EM agorithm for parameter estimation of kgmm coud be summarized as in Tabe 2. Assuming the number of Gaussian components is M, we initiaize the posterior probabiity p i that each sampe beongs to some Gaussian component. The agorithm coud not be terminated unti it converges or the presetting maximum iteration step is reached. Tabe 2. Parameter Estimation Agorithm for kgmm Step 0. Initiaize a p (0) i ( = 1,,M; i = 1,,N), t = 0, set t max and stopping condition fase. Step 1. Whie stopping condition is fase, t = t + 1, do Step 2-7. Step 2. Compute α (t), w (t) notations in Tabe 1. i, W(t),(W (t) ), K (t) and (K (t) ) according to Step 3. Compute the matrixes K (t),( K (t) ) via Property 1. Step 4. Compute the argest d φ eigenvaues and eigenvectors of centered kerne matrixes K (t). Step 5. Compute Ĝ(φ(x j) θ (t) ) via Property 3. Step 6. Compute a posterior probabiities p (t) i via (9) Step 7. Test the stopping condition. ( (t) p i If t>t max or k =1 i=1 otherwise oop back to Step 1. p (t 1) ) 2 i <ε, set stopping condition true, 3.4 Discussion of the Agorithm Computationa Cost and Speedup Techniques on Large Scae Probem. By empoying kerne trick, the computationa cost of kerne eigen-decomposition based methods is amost invoved by the eigen-decomposition step. Therefore, the computationa cost mainy depends on the size of kerne matrix, i.e. the size of data set. If the size N is not very arge (e.g. N 1, 000), it is not a probem to obtain fu eigen-decomposition. If the size N is arge enough, it is iabe to meet with the curse of dimension. As is known, if N>5, 000, it is impossibe to finish fu eigen-decomposition even within hours on the fastest PCs currenty. However, the size N is usuay very arge in some probems such as data mining. Fortunatey, as we have pointed out in Section 3.2, we need not obtain the fu eigen-decomposition for components of kgmm, and we ony need estimate
10 168 J. Wang, J. Lee, and C. Zhang the argest d φ nonzero eigenvaues and corresponding eigenvectors. As for arge scae probem, we coud make the assumption that d φ N. Some techniques can be adopted to estimate the argest d φ components for kerne methods. The first technique is based on traditiona Orthogona Iteration or Lanzcos Iteration. The second is to make the kerne matrix sparse by samping techniques [1]. The third is to appy Nyström method to speedup kerne machine [19]. However, these three techniques are a itte compicated. In this paper, we adopt another much simpe but practica technique proposed by Tayor et.a [20]. It assumes that a sampes forming kerne matrix of each component are drawn from an underying density p(x). And the probem coud be written down as a continue eigen-probem. k(x, y)p(x)β i (x)dx = λ i β i (y) (17) where λ i, β i (y) are eigenvaue and eigenvector, and k is a given kerne function. The integra coud be approximate using Monte Caro method by a subset of sampes {x i } m i=1 (m N,m d φ) drawn according to p(x). k(x, y)p(x)v i (x)dx 1 N k(x j,y)v i (x j ) (18) j=1 Pugging in y = x k for j =1,,N, we obtain a matrix eigen-probem. 1 N k(x j,x k )V i (x j )=ˆλ i V i (x k ) (19) j=1 where ˆλ i is the approximation of eigenvaue λ i. This approximation approach has been proved feasibe and has bounded error. We appy it to our parameter estimation agorithm on arge scae probem (N > 1, 000). In our agorithm, the underying density of component is approximated by Ĝ(φ(x) θ (t) ). We do samping to obtain a subset with size m (t) according to Ĝ(φ(x) θ ), and perform fu eigen-decomposition on such subset to obtain the argest d φ eigen-components. With empoying this Monte Caro samping technique, the computationa cost upon arge scae probem coud be reduced greaty. Furthermore, the memory needed by the parameter estimation agorithm aso reduces greaty upon arge scae probem. These makes the proposed kgmm efficient and practica. Comparison with Reated Works. There are sti some other work reated to ours. One major is the spectra custering agorithm [21]. Spectra custering coud be regarded as using RBF based kerne method to extract features and then performing custering by K-means. Compared with spectra custering, the proposed kgmm has at east two advantages. (1)kGMM can provide resut in probabiistic framework and can incorporate prior information easiy. (2) kgmm can be used in supervised earning probem as a density estimation method. A these advantages encourage us to appy the proposed kgmm.
11 Kerne Trick Embedded Gaussian Mixture Mode 169 Misunderstanding. We emphasize a misunderstanding of the proposed mode. Someone doubt that it can simpy run GMM in the reduced dimension space obtained by Kerne PCA to achieve the same resut as kgmm. They say that this just need project the data into the first d φ dimension of the feature space by Kerne PCA, and then perform GMM parameter estimation in that principe subspace. However, the choice of a proper d φ is a critica probem so that the performance wi competey depend on the choice of d φ.ifd φ is too arge, the estimation is not feasibe because probabiity estimation demands that the number of sampes is arge enough in comparison with dimension d φ, and the computationa cost wi increase greaty simutaneousy. On the contrary, sma d φ makes the estimated parameters not we represent the data. The proposed kgmm does not have that probem since the approximated density function is compete and optima under the minimum Kuback-Leiber divergence criteria. Moreover, kgmm can aow different component with different d φ. A these improve the fexibiity and expand the appication of the proposed kgmm. 4 Experiments In this section, two experiments are performed to vaidate the proposed kgmm compared with traditiona GMM. Firsty kgmm is empoyed as unsupervised earning or custering method on synthetic data set. Secondy kgmm is empoyed as supervised density estimation method for rea-word handwritten digit recognition. 4.1 Synthetic Data Custering Using Kerne GMM To provide an intuitive comparison between the proposed kgmm and traditiona GMM, we first conduct experiments on synthetic 2-D data sets. The data sets each with 1,000 sampes are depicted in Figure 1 and Figure 2. For traditiona GMM, a the sampes are used to estimate the parameters of two components mixture of Gaussian. When the agorithm stops, each sampe wi beong to one of the components or custers according to its posterior. The custering resuts of traditiona GMM are shown in Figure 1(a) and Figure 2(a). The resuts are obviousy not satisfying. However, by using kgmm with a poynomia kerne of degree 2, d φ = 4 for each Gaussian component and the same custering scheme as traditiona GMM, we achieve the promising resuts as shown in Figure 1(b) and Figure 2(b). Besides, kgmm provides probabiistic information as in Figure 1(c) and Figure 2(c), which cannot provide by most cassica kerne methods. 4.2 USPS Data-Set Recognition Using Kerne GMM Kerne GMM is aso appied to a rea-word probem, the US Posta Service (USPS) handwritten digit recognition. The data set consists of 9,226 grayscae
12 170 J. Wang, J. Lee, and C. Zhang (a) (b) (c) Fig. 2. Data set consists of 1,000 sampes. Points marked by beong to one custer and marked by beong to the other. (a) is the partition resut by traditiona GMM; (b) is the resut achieved by kgmm; (c) shows the probabiity that each point beongs to the eft-right custer. The whiter the point is, the higher the probabiity is. images of size 16x16, divided into a training set of 7,219 images and a test set of 2,007 images. The origina input data is just vector form of the digit image, i.e., the input feature space is with dimensionaity 256. Optionay, we can perform a inear discriminant anaysis (LDA) to reduce the dimensionaity of feature space. If LDA is performed, the feature space yieds to be 39. Each category ω is estimated a density of p(x ω) using 4 components GMM on training set. To cassify an test sampe x, we use the Bayesian decision rue { } ω = arg max p(x ω)p (ω),ω=1,, 10 (20) ω where P (ω) is prior probabiity of category ω. In this experiment, we set P (ω) = 1/10. That is to say, a categories are with equa prior probabiity. To be comparison, kgmm adopts the same experiment scheme as traditiona GMM except for using an RBF kerne function k(x, x ) = exp( γ x x 2 ) with γ =0.0015, Gaussian mixture component number of 2 and d φ = 40 for each Gaussian component. The experiment resuts of GMM in origina input space, GMM in the space by LDA (from [4]) and kgmm are shown in Tabe 3. We can see that kgmm, with ess components number, has obviousy much better performance than traditiona GMM with or without LDA. Athough, the resut by kgmm is not the state-of-art resut on USPS, we sti can improve the resut by incorporating invariance prior knowedge using Tangent distance as in [4].
13 Kerne Trick Embedded Gaussian Mixture Mode 171 Tabe 3. Comparison resuts on USPS data set Method Best Error rate GMM 8.0% LDA+GMM 6.7% Kerne GMM 4.3% 5 Concusion In this paper, we present a kerne Gaussian Mixture Mode, and deduce a parameter estimation agorithm by embedding kerne trick into EM agorithm. Furthermore, we adopt a Monte Caro samping technique to speedup kgmm upon arge scae probem, thus make it more practica and efficient. Compared with most cassica kerne methods, kgmm can sove probems in a probabiistic framework. Moreover, it can tacke noninear probems better than the traditiona GMM. Experimenta resuts on synthetic and rea-word data set show that the proposed approach has satisfied performance. Our future work wi focus on incorporating prior knowedge such as invariance in kgmm and enriching its appications. Acknowedgements. The author woud ike to thank anonymous reviewers for their hepfu comments, aso thank Jason Xu for hepfu conversations about this work. References 1. Achioptas, D., McSherry, F. and Schökopf, B.: Samping techniques for kerne methods. In Advances in Neura Information Processing System (NIPS) 14, MIT Press, Cambridge MA (2002) 2. Bimes, J. A.: A Gente Tutoria on the EM Agorithm and its Appication to Parameter Estimation for Gaussian Mixture and Hidden Markov Modes, Technica Report, UC Berkeey, ICSI-TR (1997) 3. Bishop, C. M.: Neura Networks for Pattern Recognition, Oxford University Press. (1995) 4. Dahmen, J., Keysers, D., Ney, H. and Güd, M.O.: Statistica Image Object Recognition using Mixture Densities. Journa of Mathematica Imaging and Vision, 14(3) (2001) Duda, R. O., Hart, P. E. and Stork, D. G.: Pattern Cassification, New York: John Wiey & Sons Press, 2nd Edition. (2001) 6. Everitt, B. S.: An Introduction to Latent Variabe Modes, London: Chapman and Ha. (1984) 7. Francis R. B. and Michae I. J.: Kerne Independent Component Anaysis, Journa of Machine Learning Research, 3, (2002) Geste, T. V., Suykens, J.A.K., Lanckriet, G., Lambrechts, A., Moor, B. De and Vanderwae J.: Bayesian framework for east squares support vector machine cassifiers, gaussian processs and kerne fisher discriminant anaysis. Neura Computation, 15(5) (2002)
14 172 J. Wang, J. Lee, and C. Zhang 9. Herbrich, R., Graepe, T. and Campbe, C.: Bayes Point Machines: Estimating the Bayes Point in Kerne Space. In Proceedings of Internationa Joint Conference on Artificia Inteigence Work-shop on Support Vector Machines, (1999) Kwok, J. T.: The Evidence Framework Appied to Support Vector Machines, IEEE Trans. on NN, Vo. 11 (2000) Mika, S., Rätsch, G., Weston, J., Schökopf, B. and Müer, K.R.: Fisher discriminant anaysis with kernes. IEEE Workshop on Neura Networks for Signa Processing IX, (1999) Mjosness, E. and Decoste, D.: Machine Learning for Science: State of the Art and Future Pros-pects, Science. Vo. 293 (2001) 13. Roberts, S. J.: Parametric and Non-Parametric Unsupervised Custer Anaysis, Pattern Recogni-tion, Vo. 30. No 2, (1997) Schökopf, B., Smoa, A.J. and Müer, K.R.: Noninear Component Anaysis as a Kerne Eigen-vaue Probem, Neura Computation, 10(5), (1998) Schökopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müer, K. R., Raetsch, G. and Smoa, A.: Input Space vs. Feature Space in Kerne-Based Methods, IEEE Trans. on NN, Vo 10. No. 5, (1999) Schökopf, B. and Smoa, A. J.: Learning with Kernes: Support Vector Machines, Reguarization and Beyond, MIT Press, Cambridge MA (2002) 17. Tipping, M. E.: Sparse Bayesian Learning and the Reevance Vector Machine, Journa of Machine Learning Research. (2001) 18. Vapnik, V.: The Nature of Statistica Learning Theory, 2nd Edition, Springer- Verag, New York (1997) 19. Wiiams, C. and Seeger, M.: Using the Nyström Method to Speed Up Kerne Machines. In T. K. Leen, T. G. Diettrich, and V. Tresp, editors, Advances in Neura Information Processing Systems (NIPS)13. MIT Press, Cambridge MA (2001) 20. Tayor, J. S., Wiiams, C., Cristianini, N. and Kandoa J.: On the Eigenspectrum of the Gram Matrix and Its Reationship to the Operator Eigenspectrum, N. CesaBianchi et a. (Eds.): ALT 2002, LNAI 2533, Springer-Verag, Berin Heideberg (2002) Ng, A. Y., Jordan, M. I. and Weiss, Y.: On Spectra Custering: Anaysis and an agorithm, Advance in Neura Information Processing Systems (NIPS) 14, MIT Press, Cambridge MA (2002) 22. Moghaddam, B. and Pentand, A.: Probabiistic visua earning for object representation, IEEE Trans. on PAMI, Vo. 19, No. 7 (1997) Appendix To be convenient, the subscripts representing Gaussian components and the superscripts representing iterations are omitted in this part. [Lemma 1] Suppose φ( ) is a mapping function that satisfies Mercer conditions as in Equation (1)(Section 2) with N training sampes X = {x i } N i=1. w is a N-dimensiona coumn vector with w =[w 1,,w N ] T R N. Let s define µ = i=1 φ(x i)wi 2 and φ(x i )=φ(x i ) µ Then the foowing consequences hod true: (1) If K is a N N kerne matrix such that K ij = ( w i φ(x i ) w j φ(x j ) ), and K is a N N matrix, which is centered in the feature space, such that
15 Kerne Trick Embedded Gaussian Mixture Mode 173 K ij = ( w i φ(xi ) w j φ(xj ) ), then we can get: K = K WK KW + WKW (a1) where W = ww T. (2) If K is a N N projecting kerne matrix such that K ij = ( φ(x i ) w j φ(x j ) ), and K is a N N matrix, which is centered in the feature space, such that K ij = ( φ(xi ) w j φ(xj ) ), then K = K W K K W + W KW (a2) where W = 1 N w T, 1 N is a N-dimensiona coumn vector that a entries equa to 1. Proof: (1) K ij = ( w i φ(xi) w j φ(xj) ) = w i φ(xi) T w j φ(xj) = ( N w i(φ(x i) φ(x k )wk) 2 ) T ( N wj(φ(x j) φ(x k )wk) 2 ) k=1 k=1 = ( w iφ(x i) T w ) ( jφ(x j) w i w k wk φ(x k ) T w ) jφ(x j) k=1 N ( w k wiφ(x i) T w k φ(x k ) ) w j + w i k=1 k=1 N k=1 n=1 N = K ij w i w k K kj w k K ik w j + w i k=1 N ( w k w n wk φ(x k ) T w ) nφ(x n) N k=1 n=1 then we can get the more compact expression as (a1). Simiary, we can prove (2). [Lemma 2] Suppose Σ is a covariance matrix such that N ( w k w n wk φ(x k ) T w ) nφ(x n) w j Σ = i=1 φ(x i ) φ(x i ) T w 2 i (a3) Then the foowing woud be hod (1) ΣV = λv Kβ = λβ. (2) V = i=1 β φ(x i i )w i. Proof: (1) Firsty, we prove. If ΣV = λv, then the soution V ies in space spanned by w 1 φ(x1 ),,w N φ(xn ), and we have two usefu consequences: firsty, we can consider the equivaent equation λ ( w k φ(xk ) V ) = ( w k φ(xk ) ΣV ), for a k =1,,N (a4)
16 174 J. Wang, J. Lee, and C. Zhang and secondy, there exist coefficients β i (i =1,,N) such that V = β iw i φ(xi ) (a5) i=1 Combining (a4) and (a5), we get N ( λ β i wk φ(xk ) w i φ(xi ) ) N N = β i (w k φ(xk ) w j φ(xj ) ( w j φ(xj ) w i φ(xi ) )) i=1 i=1 j=1 then this can read λ Kβ = K 2 β where β =[β 1,,β N ] T. K is a symmetric matrix, which has a set of eigenvectors which span the whoe space, thus λβ = Kβ (a6) Simiary, we can prove. (2) is easy to prove, so the proof is omitted. [Lemma 3] If x R d is a sampe, with φ(x) =φ(x) µ, then ( V φ(xj ) ) = i=1 β i where Γ is j-th coumn of centered projecting matrix K. Proof: This Lemma foows from (a5). ( wi φ(xi ) φ(x j ) ) = β T Γ (a7)
Statistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationMultilayer Kerceptron
Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationKernel pea and De-Noising in Feature Spaces
Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationA Solution to the 4-bit Parity Problem with a Single Quaternary Neuron
Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria
More informationSVM-based Supervised and Unsupervised Classification Schemes
SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationSoft Clustering on Graphs
Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de
More informationFirst-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries
c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationCryptanalysis of PKP: A New Approach
Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in
More informationStochastic Variational Inference with Gradient Linearization
Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,
More informationSteepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1
Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rues 1 R.J. Marks II, S. Oh, P. Arabshahi Λ, T.P. Caude, J.J. Choi, B.G. Song Λ Λ Dept. of Eectrica Engineering Boeing Computer Services University
More informationKernel Matching Pursuit
Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationA proposed nonparametric mixture density estimation using B-spline functions
A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),
More informationA Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)
A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,
More informationResearch Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation
Appied Mathematics and Stochastic Anaysis Voume 007, Artice ID 74191, 8 pages doi:10.1155/007/74191 Research Artice On the Lower Bound for the Number of Rea Roots of a Random Agebraic Equation Takashi
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationAlgorithms to solve massively under-defined systems of multivariate quadratic equations
Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations
More informationFormulas for Angular-Momentum Barrier Factors Version II
BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationMulticategory Classification by Support Vector Machines
Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department
More informationConvergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems
Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,
More informationBayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems
Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationNonlinear Gaussian Filtering via Radial Basis Function Approximation
51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This
More informationDiscriminant Analysis: A Unified Approach
Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University
More informationA Robust Voice Activity Detection based on Noise Eigenspace Projection
A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi
More informationResearch of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance
Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationSUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationMath 124B January 31, 2012
Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationActive Learning & Experimental Design
Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection
More informationAn Information Geometrical View of Stationary Subspace Analysis
An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin
More informationDavid Eigen. MA112 Final Paper. May 10, 2002
David Eigen MA112 Fina Paper May 1, 22 The Schrodinger equation describes the position of an eectron as a wave. The wave function Ψ(t, x is interpreted as a probabiity density for the position of the eectron.
More informationConverting Z-number to Fuzzy Number using. Fuzzy Expected Value
ISSN 1746-7659, Engand, UK Journa of Information and Computing Science Vo. 1, No. 4, 017, pp.91-303 Converting Z-number to Fuzzy Number using Fuzzy Expected Vaue Mahdieh Akhbari * Department of Industria
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationEfficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression
Machine Learning manuscript No. (wi be inserted by the editor) Efficient Approximate Leave-One-Out Cross-Vaidation for Kerne Logistic Regression Gavin C. Cawey, Nicoa L. C. Tabot Schoo of Computing Sciences
More informationFFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection
FFTs in Graphics and Vision Spherica Convoution and Axia Symmetry Detection Outine Math Review Symmetry Genera Convoution Spherica Convoution Axia Symmetry Detection Math Review Symmetry: Given a unitary
More informationBP neural network-based sports performance prediction model applied research
Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied
More informationhttps://doi.org/ /epjconf/
HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department
More informationResearch Article Numerical Range of Two Operators in Semi-Inner Product Spaces
Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda
More information4 1-D Boundary Value Problems Heat Equation
4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More informationIntuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games
Journa of Uncertain Systems Vo.5, No.4, pp.27-285, 20 Onine at: www.jus.org.u Intuitionistic Fuzzy Optimization Technique for Nash Equiibrium Soution of Muti-objective Bi-Matri Games Prasun Kumar Naya,,
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the
More information$, (2.1) n="# #. (2.2)
Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationA unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999
More informationA Ridgelet Kernel Regression Model using Genetic Algorithm
A Ridgeet Kerne Regression Mode using Genetic Agorithm Shuyuan Yang, Min Wang, Licheng Jiao * Institute of Inteigence Information Processing, Department of Eectrica Engineering Xidian University Xi an,
More informationWavelet Galerkin Solution for Boundary Value Problems
Internationa Journa of Engineering Research and Deveopment e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Voume, Issue 5 (May 24), PP.2-3 Waveet Gaerkin Soution for Boundary Vaue Probems D. Pate, M.K.
More informationC. Fourier Sine Series Overview
12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a
More informationInternational Journal "Information Technologies & Knowledge" Vol.5, Number 1,
Internationa Journa "Information Tecnoogies & Knowedge" Vo.5, Number, 0 5 EVOLVING CASCADE NEURAL NETWORK BASED ON MULTIDIMESNIONAL EPANECHNIKOV S KERNELS AND ITS LEARNING ALGORITHM Yevgeniy Bodyanskiy,
More informationSupervised i-vector Modeling - Theory and Applications
Supervised i-vector Modeing - Theory and Appications Shreyas Ramoji, Sriram Ganapathy Learning and Extraction of Acoustic Patterns LEAP) Lab, Eectrica Engineering, Indian Institute of Science, Bengauru,
More informationAppendix A: MATLAB commands for neural networks
Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');
More informationWeek 6 Lectures, Math 6451, Tanveer
Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationProblem set 6 The Perron Frobenius theorem.
Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator
More informationMinimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch
The Eighth Internationa Symposium on Operations Research and Its Appications (ISORA 09) Zhangiaie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 402 408 Minimizing Tota Weighted Competion
More informationAppendix for Stochastic Gradient Monomial Gamma Sampler
3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationNew Efficiency Results for Makespan Cost Sharing
New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context
More informationMode in Output Participation Factors for Linear Systems
2010 American ontro onference Marriott Waterfront, Batimore, MD, USA June 30-Juy 02, 2010 WeB05.5 Mode in Output Participation Factors for Linear Systems Li Sheng, yad H. Abed, Munther A. Hassouneh, Huizhong
More informationIntegrating Factor Methods as Exponential Integrators
Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been
More informationMaximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR
1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,
More informationIntroduction to Simulation - Lecture 13. Convergence of Multistep Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy
Introduction to Simuation - Lecture 13 Convergence of Mutistep Methods Jacob White Thans to Deepa Ramaswamy, Micha Rewiensi, and Karen Veroy Outine Sma Timestep issues for Mutistep Methods Loca truncation
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More informationAALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009
AALBORG UNIVERSITY The distribution of communication cost for a mobie service scenario by Jesper Møer and Man Lung Yiu R-29-11 June 29 Department of Mathematica Sciences Aaborg University Fredrik Bajers
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationMoreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationAkaike Information Criterion for ANOVA Model with a Simple Order Restriction
Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationFast Spectral Clustering via the Nyström Method
Fast Spectra Custering via the Nyström Method Anna Choromanska, Tony Jebara, Hyungtae Kim, Mahesh Mohan 3, and Caire Monteeoni 3 Department of Eectrica Engineering, Coumbia University, NY, USA Department
More information6 Wave Equation on an Interval: Separation of Variables
6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationError Bounds on the SCISSORS Approximation Method
pubs.acs.org/jcim Error Bounds on the SCISSORS Approximation Method Imran S. Haque and Vijay S. Pande*,, Department of Computer Science and Department of Chemistry, Stanford University, Stanford, Caifornia,
More informationRobust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation
Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017
More informationAppendix for Stochastic Gradient Monomial Gamma Sampler
Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3
More informationBALANCING REGULAR MATRIX PENCILS
BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationLeast-squares Independent Component Analysis
Neura Computation, vo.23, no., pp.284 30, 20. Least-squares Independent Component Anaysis Taiji Suzuki Department of Mathematica Informatics, The University of Tokyo 7-3- Hongo, Bunkyo-ku, Tokyo 3-8656,
More informationT.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA
ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network
More informationMath 124B January 17, 2012
Math 124B January 17, 212 Viktor Grigoryan 3 Fu Fourier series We saw in previous ectures how the Dirichet and Neumann boundary conditions ead to respectivey sine and cosine Fourier series of the initia
More informationThe Symmetric and Antipersymmetric Solutions of the Matrix Equation A 1 X 1 B 1 + A 2 X 2 B A l X l B l = C and Its Optimal Approximation
The Symmetric Antipersymmetric Soutions of the Matrix Equation A 1 X 1 B 1 + A 2 X 2 B 2 + + A X B C Its Optima Approximation Ying Zhang Member IAENG Abstract A matrix A (a ij) R n n is said to be symmetric
More information