Discriminant Analysis: A Unified Approach
|
|
- Elvin Ball
- 5 years ago
- Views:
Transcription
1 Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University Mathematics Department New Oreans, LA 708 riede@math.tuane.edu Abstract Linear discriminant anaysis (LDA) as a dimension reduction method is widey used in data mining and machine earning. It however suffers from the sma sampe size (SSS) probem when data dimensionaity is greater than the sampe size. Many modified methods have been proposed to address some aspect of this difficuty from a particuar viewpoint. A comprehensive framework that provides a compete soution to the SSS probem is sti missing. In this paper, we provide a unified approach to LDA, and investigate the SSS probem in the framework of statistica earning theory. In such a unified approach, our anaysis resuts in a deeper understanding of LDA. We demonstrate that LDA (and its noninear extension) beongs to the same framework where powerfu cassifiers such as support vector machines (SVMs) are formuated. In addition, this approach aows us to estabish an error bound for LDA. Finay our experiments vaidate our theoretica anaysis resuts. Introduction The purpose of this paper is to present a unified theoretica framework for discriminant anaysis. Discriminant anaysis [6, 0] has been successfuy used as a dimensionaity reduction technique for many cassification probems. According to Fisher s criterion, one has to find a projection matrix W that maximizes: J(W ) = W T S b W W T S w W where S b and S w are so-caed between-cass and withincass matrices, respectivey. In practice, the sma sampe size (SSS) probem is often encountered, where S w is singuar. Therefore, the maximization probem can be difficut to sove. To address this issue, the term εi is added, where ε is a sma positive number and I the identity matrix of proper () size. This resuts in maximizing J(W ) = W T S b W W T (S w + εi)w. (2) It can then be soved without any numerica probems. In [9], Friedman discusses reguarized discriminant anaysis with regard to the sma sampe size probem. Equation (2) is a specia case of his reguarized discriminant anaysis in practice. However, Friedman eaves open the question of whether an optima choice for the parameter ε exists, and if so, whether this optima choice is a unique one? Further, Friedman s anaysis is based on traditiona parametric methods in statistics, and Gaussian distributions for underying random variabes are often assumed. That is, his reguarization technique is one of improving the estimation of the covariance matrices that are possiby biased due to the sma data size. The vaidity of this technique can be very imited. For exampe, the noninear case of discriminant anaysis, caed generaized discriminant anaysis (GDA), becomes very popuar in practice and shows promising potentia in many appications. The idea is to map the data into a kerne induced feature space and then perform discriminant anaysis there. However, in a high and possibe infinite dimensiona feature space, GDA is not we justified from Friedman s point of view. Thus, a rigorous justification for the reguarization term εi is sti missing, especiay when GDA is considered. In [7], Evgeniou, Ponti and Poggio present a theoretica framework for the reguarization functiona (the so caed reguarization network) minh[f] = f (y i f(x i )) 2 + λ f 2 K, (3) where f 2 K is a norm in a Reproducing Kerne Hibert Space H defined by the positive definite function K, is the number of data points or exampes and λ is a reguarization parameter. In their paper they justify the above functiona
2 using statistica earning theory that is mosty deveoped by Vapnik [22]. In this paper, we wi show that discriminant anaysis can be cast in the framework of statistica earning theory. More specificay, Fisher s criterion () is equivaent to the reguarization network (3) without reguarization, which is, minh[f] = f (y i f(x i )) 2. And Equation (2) is equivaent to the reguarization network (3) with the parameter ε corresponding to the reguarization term λ. Thus the roe of εi is we justified by statistica earning theory. Further we show that the optima choice of λ exists, and that it is unique. Additionay we estabish an error bound for the function that discriminant anaysis tries to find.. Reated Work The reationship between east squares regression and Fisher s inear discriminant has been we known for a ong time. There is a good review in [6]. Actuay in the semina paper [8], Fisher aready pointed out its connection to the regression soution. In [7], Mika aso provides an extensive review of LDA. Another widey used technique in statistics, canonica correation anaysis, is aso cosey reated to east squares regression []. The reguarized east squares (RLS) methods have been studied for a ong time, under different names. In statistics, ridge regression [2] has been very popuar for soving bady conditioned inear regression probems. After Tikhonov pubished his book [2], it was reaized that ridge regression uses the reguarization term in Tikhonov s sense. In the 980 s, weight decay was proposed to hep prune unimportant neura network connections, and was soon recognized [3] that weight decay is equivaent to ridge regression. Recenty this od method was found essentia in the framework of statistica earning theory, abeed by different names (RLS [8], Reguarized Network [7], east squares SVMs [9], and proxima SVMs []). It is demonstrated that this regression method is fuy comparabe to SVMs when used as a cassifier [8, 24]. Most recenty, an error bound for RLS given a finite sampe data set was deveoped in [5]. Many eements discussed in this paper are scattered throughout the iterature and may ook famiiar to some audience but not to others. The fact is that most are not comprehensive enough for this topic and uncear issues pop up for many. For exampe, many researchers such as Mika [7] appy the reguarization technique to address the SSS probem but they sti foow Friedman s principe that is quite imited. It is our goa to adopt a simpe yet unified approach to making this topic more comprehensive. 2 Discriminant Learning Anaysis In this section, we first review LDA using Fisher s criterion, and then go on to study it from a earning theory point of view, which we ca discriminant earning anaysis (DLA). 2. Linear Discriminant Anaysis In LDA, within-cass, between-cass, and mixture scatter matrices are used to formuate the criteria of cass separabiity. Consider a J-cass probem, where m 0 is the mean vector of a data, and m j is the mean vector of j-th cass data. A within-cass scatter matrix characterizes the scatter of sampes around their respective cass mean vectors, and it is expressed by S w = J j j= (xj i m j)(x j i m j) T, where j is the size of the data in the j th cass. A betweencass scatter matrix characterizes the scatter of the cass means around the mixture mean m 0. It is expressed by S b = J j= j(m j m 0 )(m j m 0 ) T. The mixture scatter matrix is the covariance matrix of a sampes, regardess of their cass assignments, and it is given by S m = (x i m 0 )(x i m 0 ) T = S w + S b. Fisher s criterion is used to find the projection matrix that maximizes (). In order to determine the matrix W that maximizes J(W ), one can sove the generaized eigenvaue probem: S b w i = λ i S w w i. The eigenvectors corresponding to the argest eigenvaues form the coumns of W. For a two cass probem, it can be written in a simper form: S w w = m = m m 2, (4) where m and m 2 are the means of the two casses. 2.2 LDA as Least Squares Function Estimation LDA is a supervised subspace earning probem where we are given a set z = {(x i, y i )} of training exampes drawn i.i.d. from the probabiity space Z = X Y. Here probabiity measure ρ is defined, and x i are the n- dimensiona inputs. We first consider two cass probems where we have y i {, } as cass abes. Without oss of generaity, we assume x has zero mean, i.e., E(x) = 0. Let us consider: f MSE = arg min (y i f(x i )) 2. (5) f This is a mean squared function estimation probem. Here we first consider the hypothesis space H L, the function cass of inear projections. That is, H L = {f f(x) = w T x, x X}. Thus we wi sove w opt = arg min w (y i w T x i ) 2. (6)
3 It turns out that the soution of (6) is the same as the soution obtained by Fisher s criterion. Lemma The inear system derived by the east squares criterion (6) is equivaent to the one derived by Fisher s criterion (), up to a constant, in two cass probems. Proof: We rewrite (6) as: (y i w T x i ) 2 = (y X T w) T (y X T w) = (y T w T X)(y X T w) = 2( m 2 m 2 ) T w + w T S m w. We use the fact that Xy = m 2 m 2. Taking the derivative with respect to w and setting the resut to 0 we obtain S m w = ( m 2 m 2 ). (7) This is equivaent to Equation (4). When S w is fu rank, three equations S m w = ( m 2 m 2 ), S m w = m m 2, and S w w = m m 2 have the same soution w up to a constant factor, given that the overa mean is 0. For detais see Appendix A. 2.3 Interpretation of LDA within Learning Theory When both casses are gaussian distributed, the LDA can be shown optima. However more often in reaity they are not, then it is not cear how good the LDA is. The conventiona view of LDA is imited here. The earning framework, on the other hand, is more genera. There is no need to assume the distribution of the data to discuss the goodness. Instead, the earning approach directy aims to find a good function mapping and the goodness is we defined (see Appendix C and??). In a genera sense, LDA tries to find a transformation f such that the transformed data have a arge cass mean difference and sma within-cass scatters. The equivaence between (6) and () can be understood in another way: one minimizes the mean squared error with respect to each cass mean whie keeping the mean difference constant. We define f ρ as the best function that minimizes the mean squared error. That is f ρ = arg min (y f(x)) 2 dρ. (8) f Z Given a set of data z, one tries to earn the function f z that is as cose as possibe to the optima mapping f ρ. That is, f opt = arg min f Z (f z f ρ ) 2 dρ. However, the probabiity measure ρ is unknown and so is f ρ. Instead, we may z consider (5) that is caed the empirica error (risk) minimization. Because (6) is equivaent to Fisher s criterion, it is cear that Fisher s criterion eads to an empirica error minimization probem. From the statistica earning theory point of view, however, without controing the function norm, soving equation (5) often eads to overfitting data and the soution is not stabe. And when the sampe size is smaer than the dimensionaity, it is an i-posed probem and the soution is not unique. That is where diverse LDA techniques have been deveoped to sove the SSS probem. 2.4 Discriminant Learning Anaysis as Reguarized Least Squares Regression Foowing [7], we instead minimize over a hypothesis space H the foowing reguarized functiona for a fixed positive parameter λ: f DLA = arg min f H (y i f(x i )) 2 + λ f 2 H. (9) Again, we consider the inear projection function space H L. In this case, f 2 = w T w. w opt = arg min w (y i w T x i ) 2 + λw T w. (0) Thus, (y i w T x i ) 2 + λw T w = 2( m 2 m 2 ) T w + w T (S m + λi)w. Taking the derivative with respect to w and setting the resut to 0, we have (S m + λi)w = ( m 2 m 2 ). () In Appendix A, it is shown that () is equivaent to (S w + λi)w = m, (2) and therefore soving it is equivaent to maximizing (2). We ca this approach discriminant earning anaysis. Since for any given positive reguarization term λ, (9) aways has a unique soution, we have Proposition 2 Discriminant earning anaysis is a weposed probem. 2.5 Noninear DLA In the new framework of discriminant earning anaysis, the noninear case is not an extension of the inear case by the kerne trick. In the above anaysis we have adopted the inear hypothesis space H L, whereby we have been ooking for an optimum soution in that space. If we choose a genera Reproducing Kerne Hibert Space (RKHS) as our hypothesis space, we wi obtain noninear function mappings as the subspace. The derivation of DLA in a genera function space is simpy carried out in terms of function
4 anaysis, in a simiar way as in inear space. Here we skip the detais of the derivations. Briefy speaking, for a two cass probem, the noninear DLA soution is found by foowing a simpe agorithm [8] (K + λi)c = y, (3) where K is the Gram matrix of the training data, and K = [k(x i, x j )] where k is the kerne function. y is the vector of cass abes. After soving c = [c,..., c ], any given new data x wi be projected onto the subspace by s = c ik(x i, x). 2.6 Error Bound for DLA By setting the discriminant earning anaysis in the framework of statistica earning theory, we can provide an error bound for f DLA obtained by (9) in the foowing theorem: Theorem 3 Let f ρ be defined by (8), with confidence δ, the error bound for the soution f DLA of (9) is given by: (f DLA f ρ ) 2 dρ X S(λ) + A(λ), (4) where A(λ) and S(λ) are given by λ /2 L 4 k f ρ 2 and 32M 2 (λ+c k ) 2 λ v (, δ). And there exists an unique λ that 2 minimizes S(λ) + A(λ). This decomposition is often caed bias-variance tradeoff. A(λ) is caed the approximation error. Given a hypothesis space, it measures the error between the optima function in it and the true target. L 4 k in A(λ) is simpy an operator. S(λ) bounds the sampe error and is essentiay derived by the big number aw. M and C k are constants and v (, δ) is the unique soution of an equation whose coefficients contain sampe size and confidence parameter δ. Due to its compex nature, we do not provide the detais of the proof, which can be found in [5]. To the best of our knowedge, this is the first known error bound for LDA. 3 Muti-cass DLA We have presented discriminant earning anaysis as an aternative approach to LDA in two cass probems. The data is projected onto ony one dimension that is adequate for the two cass probem. However, LDA is generay used to find a subspace with d dimensions for a mutipe cass probem. In this section we extend DLA to the muti-cass case. For a two cass probem, the function f is a mapping from a vector x to a scaer y R. If we want to find a subspace with d > dimensions, we actuay consider a mapping from vector x to vector y R d. This direct approach has some difficuties. First, statistica earning theory is concerned with ony the first kind of function mapping f : R n R, and cannot be directy extended to f : R n R d. Second, subspace dimensionaity d is not fixed (rather it is often provided by the user). These difficuties make the attempt at directy casting DLA in the mutipe cass case as a east squares estimation probem a very difficut task. Notice that LDA in a mutipe cass probem can be decomposed into two cass probems. In the i-th two cass probem, it treats the i-th cass as one cass and a remaining casses as the second cass. Each binary cass probem is soved first, and after finding a subspaces, PCA is appied to find eigenvectors having the argest eigenvaues. These new eigenvectors are the soution of the origina muti-cass LDA probem (See Appendix B). Thus, DLA can be naturay extended to the muti-cass case. Simpy, in the decomposition step, we repace two cass LDA by two cass DLA. In the inear case, it turns out that the DLA agorithm in the muti-cass case simpy soves S b W = (S w + λi)w Λ. (5) This is equivaent to soving (2). 4 Discussions In this section, we wi anayze the roe of the reguarization term λ. We are especiay interested in comparing the DLA agorithm (5) to various LDA agorithms, such as PCA+LDA [2, 20], scatter-lda [5, 0], newlda [3] and DLDA [23]. These techniques are mosty proposed for soving facia recognition probems where the SSS probem aways occurs. We summarize them briefy: PCA+LDA: Appy PCA to remove the nu space of S w first, then maximize J(W ) = W T S b W W T S w W. Scatter-LDA: Same as PCA+LDA but maximizing J(W ) = W T S b W W T S m W instead. newlda: If S w is fu rank then sove reguar LDA; ese in the nu space of S w, find the eigenvectors of S b with argest eigenvaues. DLDA: Appy PCA to remove the nu space of S b first, then find the eigenvectors of S w corresponding to the smaest eigenvaues. Some researchers aso maximize J(W ) = W T S m W W T S w W, it is equivaent to maximizing J(W ) = W T S b W W T S ww.
5 It shoud be noted that PCA+LDA and Scatter-LDA can be equivaent when S w and S m span the same subspace. However, they are different when S b totay or partiay spans the nu space of S w, thus S w and S m span different subspaces. For facia images the atter case turns out to be more common. In [3], Chen et a. prove that the nu space of S w contains discriminant information. They aso show that Scatter-LDA is not optima in that it fais to distinguish the most discriminant information in the nu space of S w. Thus they proposed the newlda method. However, newlda fe short of making use of any information outside of that nu space. DLDA, on the other hand, discards the nu space of S b. This is very probematic. First, it fais to distinguish the most discriminant information in the nu space of S w, as in Scatter-LDA. Second, the nu space of S b is not competey useess. This can be seen in a two cass case. DLDA simpy chooses m as the subspace. However this can be biased because it ignores S w. It shoud be noted that in [6] there is a miseading caim, where the authors mention that there is no significant difference between newlda and DLDA. This incorrect caim is aso pointed by others [4]. In a word, a these techniques make hard decisions, either discarding a nu space, or ony working in a nu space. In the foowing, we can see that DLA aways works in the whoe space, not making any hard decisions. Instead, it fuses information from both subspaces. DLA can work in the whoe space because it is we-posed. We first consider how LDA is soved. For the moment exists. For symmetric matrix S w there is a matrix U such that U T U = I, S w = UΣU T, and Sw T = UΣ U T, where Σ is a diagona matrix. U T is a transform matrix. Consider the foowing in a two cass LDA et us assume that S w is fu rank, and thus S w w = S w m = UΣ U T m. (6) The between vector m is transformed to a new basis by U T, then mutipied by a constant Σ ii aong each dimension, and finay transformed again by U back to the origin basis. To iustrate this process. We consider a simpe case where the data are in two dimensions. x and x 2 are the origina axis bases, as shown in Figure. x and x 2 are the orthogona axis bases in which S w is[ diagonaized. ] We 2 0 choose S w such that U T S w U = Σ =. Thus [ ] Σ m = 2, and Σ 22 = 2. Let = U T m. Then m m 2 x 2 x 2 LDA DLA Λ 2 =2/3 Λ =/3 m x and m 2 are the projection on x and x 2, respectivey. By mutipying Σ, m = 2 m and m 2 = 2m 2. The LDA soution, Sw m is shown in Figure -(a) and denoted by LDA. Now consider DLA: (S w + λi)w = m. Notice that if matrix U can diagonaize S w then it can diagonax (a) The soution of LDA and DLA x 2 LDA Λ 2 =2/3 x 2 Λ = m, DLDA DLA x newlda (b) The soution of DLA when S w s eigenvaue is zero in one dimension Figure. Iustration and comparison of soutions by LDA, DLA, DLDA and newlda. ize (S w + λi) at the same time: U T (S w + λi)u = U T S w U + λu T U = Σ + λi. Thus, (S w + λi) = U(Σ + λi) U T. To iustrate the effect of the reguarization term, we choose λ = and in Fig., Λ i are the eigenvaues of S w + λi. Now Λ = 3 x, and Λ 2 = 2 3. The vector (S w + λi) m is shown in Fig. -(a) and denoted by DLA. We now consider the case where the SSS probem occurs. We keep Λ 2 =.5 the same but Λ = (that is, the origina eigenvaue of S w is 0 aong this dimension). This case is potted in Fig. -(b). DLA takes advantage of information from both words. However, PCA+LDA wi throw away the dimension x and ony keep the direction x 2. Thus it ignores critica discriminant information. newlda, on the other hand, wi ony keep the dimension x and throw away any discriminant information aong x 2. DLDA simpy chooses m. Thus, when the SSS probem does not occur, DLA has a smoothing effect on the eigenvaues of S w. If the SSS probem does occur, DLA fuses and keeps a baance between the information both in and out of the nu space of S w. This baance is achieved by choosing proper λ. 5 Experiments 5. Feret Facia Images We tested DLA agorithm against PCA+LDA, newlda, DLDA and Scatter-LDA on FERET facia image data ( We extracted 400 images for the experiment, where there are 50 individuas with 8 images from each. The images have been preprocessed and normaized using standard methods from the FERET database. The size of each image is 50 x 30 pixes, with 256 grey eves per pixe.
6 We randomy choose five images per person for training, and the remaining three for testing. Thus the training set has 250 images whie the testing set has 50. Subspaces are cacuated from the training data, and the -NN cassifier (3-NN faied for a the methods due to five images per person in the training data) is used to obtain the accuracy rates after projecting the data onto the subspace. To obtain average performance, each method is repeated 0 times. Thus there are tota,500 testing images for each method. The reguarization term λ is chosen by 0-fod cross-vaidation. The average accuracy rates are shown in Figure 2. The X- axis represents the dimensionaity of the subspace. For each technique, the higher the dimension, the ess discriminant the dimension. For most techniques, the accuracy rates increase quicky in the first 5 dimensions, and then increase sowy with additiona dimensions. DLA is uniformy better than any other agorithms, demonstrating its efficacy. It achieves the highest accuracy rate of newlda performs quite we in these experiments, again demonstrating that the most discriminant information is in the nu space of S w, for the facia recognition task. On the other hand, Scatter-LDA does not perform we at ower dimensiona subspaces. But it eventuay performs better than PCA+LDA, when dimensions are more than 42. A methods achieve their highest accuracy rate with a 49 dimensiona subspace, which is not surprising, for this is a 50-cass probem. It is noticed that the performance of newlda and Scatter-LDA (its tai is not shown in the pot) drops quicky with unnecessary dimensions. accuracy rate DLA newlda DLDA PCA+LDA Scatter LDA Dimensions Figure 2. Comparison of DLA, LDA, newlda, DLDA and Scatter-LDA on the FERET image data. data sets GDA NDLA Better? breast cancer cancer Wisconsin credit heart Ceve heart Hungary ionosphere new thyroid pima gass sonar iris average Tabe. Cassification error rates in subspaces computed by GDA and NDLA, using 3-nn cassifier, on UCI data sets. 5.2 UCI Data Sets In these experiments, we compare noninear LDA (GDA) and noninear DLA (NDLA) in two cass cassification probems. We did not test other noninear techniques such as KDDA [6] due to their compex impementation. The impementation of kerne DLDA is very invoved. We use data sets from the UCI machine earning database. They are a two cassification probems. For each data set, we randomy choose 60% as training and the remaining 40% as testing. We train GDA and NDLA on the training data and obtain projections. We then project both training and test data on the chosen subspace and use the 3-NN cassifier to obtain error rates. Note that for the two cass case, one dimensiona subspace is sufficient. We use the Gaussian kerne k(x, x 2 ) = exp x x 2 2 σ to induce the Reproducing Kerne Hibert Space for both GDA and NDLA. 2 The kerne parameter σ and reguarization term λ were determined through 0-fod cross-vaidation. We repeat the experiments 0 times on each data set to obtain the average error rates. The resuts are shown in Tabe. On 9 data sets out of, NDLA performs better than GDA. Especiay on the breast cancer credit, heart ceve, heart Hungary and gass data, it is significanty better than GDA, with 95% confidence. 6 Summary This paper presents a earning approach, Discriminant Learning Anaysis for effective dimension reduction. Whie discriminant anaysis is formuated as a east square approx-
7 imation probem, the DLA is a reguarized one. Regardess of the data distribution, the approach aows us to aways measure the goodness of mapping function that DLA computes: an error bound for it is estabished. The commony adopted technique: S w + εi is we justified in the framework of statistica earning theory. The existence and uniqueness of the term ε is obtained. The DLA is aso compared to many other competing agorithms and it is demonstrated both theoreticay and experimentay, that DLA is more robust than others. Appendix A In this appendix, we say Aw = c v and Aw = c 2 m are equivaent in the sense that the soution ws are in the same direction, where A is a matrix, c i s are scaers and m is a vector. First we show that S m w = m and S w w = m have the same soution (same set of eigenvectors), where m = m m 2 is the mean difference of the two cass probem. We know that soving S w w = m is equivaent to soving [6] S w S b Φ = ΦΛ (7) where Φ and Λ are the eigenvector and eigenvaue matrices of Sw S b. Since we have S w = S m S b, foowing [0] (pp. 454), (7) can be converted to (S m S b )ΦΛ = S b Φ S m ΦΛ = S b Φ(I + Λ) S m S b Φ = ΦΛ(I + Λ). (8) This shows that Φ is aso the eigenvector matrix of Sm S b, and its eigenvaue matrix is Λ(I + Λ). When the components of Λ and α i are arranged from the argest to the smaest as α... α n, the corresponding components of Λ(I + Λ) α are aso arranged as +α... α n +α n. That is, the t eigenvectors of Sm S b corresponding to the t argest eigenvaues are the same as the t eigenvectors of Sw S b corresponding to the t argest eigenvaues. As a specia case (two cass probems), S m w = m and S w w = m share the same eigenvector soution. Now we show that S m w = ( m 2 m 2 ) and S m w = m m 2 share the same soution aso. Consider that the overa mean m 0 is 0. m 0 = m + 2 m 2 = 0, we have m = 2 m, m 2 = m, and m 2 m 2 = 2 2 m. Thus S m w = ( m 2 m 2 ) becomes S m w = 22 m. With a constant c, the soution of S m w = cm is sti in the same direction aong the mean difference m, and thus is equivaent to soving Sw S b Φ = ΦΛ. It is easy to show that (S m + λi)w = ( m 2 m 2 ) is equivaent to (S w +λi)w = m m 2, simpy repacing S w and S m by S w + +λi and S m + λi in the above anaysis. Appendix B: Muti-Cass LDA In Appendix A, we showed that for a two cass probem, LDA and DLA correspond to soving (4) and (2), respectivey. S m can be written as S m = U m Λ /2 m Λ /2 m Um, T (4) can be written as Λ /2 m Umw T = Λ /2 m Umm T v. Note that v can be treated as a data point. For a muti-cass probem, Assuming S m is fu rank, LDA corresponds to the system S b W = S m W Λ. It can be write as N b V = V Λ, (9) where V = Λ /2 m UmW, T N b = Λ /2 m UmS T b U m Λ /2 m. This is a simpe eigenvaue probem. By soving V, we can compute W by W = U m Λ /2 m V. The eigenvectors in V with the argest eigenvaues have a one-to-one correspondence to the eigenvectors in W with the corresponding argest eigenvaues in the origina equation. Note that S b = J j= j(m j m 0 )(m j m 0 ) T = J j= ( j m j )( j m j ) T and N b is N b = Λ /2 m UmS T b U m Λ /2 m J = ( j Λ /2 m Um T m j )( j Λ /2 m Um T m j ) T. (20) j= If we treat every vector j Λ /2 m Um T m j as a point, then N b is the covariance of these J points. Equation (9) can aso be viewed as a PCA probem on these J points. In the above anaysis if we repace S m by S m + λi, we obtain that DLA corresponds to soving S b W = (S m +λi)w Λ w. Further S b W = (S w + λi)w Λ w. (2) The eigenvectors corresponding to the argest eigenvaues are chosen as the DLA subspace basis. Appendix C: Overview of Statistica Learning Theory The first step in the deveopment of earning theory is the assumption of existence of a probabiity measure ρ on the product space Z = X Y, from which the data are drawn. One way to define the expected error of f is the east squares error: ε(f) = Z (f(x) y)2 dρ for f : X Y. We try to minimize this error by earning. Define f ρ (x) = ydρ(y x), where ρ(y x) is the conditiona probabiity Y measure on Y w.r.t. x. It has been shown [4] that the expected error ε ρ of f ρ represents a ower bound on the error ε(f) for any f. Hence at east in theory, the function f ρ is the idea one and so is often caed the target function. However, since measure ρ is unknown, f ρ is unknown as we.
8 Starting from the training data z = {(x i, y i )} i, statistica earning theory as deveoped by Vapnik [22] buids on the so-caed empirica risk minimization (ERM) induction principe. Given a hypothesis space H, one attempts at minimizing (f z (x i ) y i ) 2, where f z H, (22) However, in genera, without controing the norm of approximation functions, this probem is i-posed. Generay speaking, a probem is caed we-posed if its soution exists, and is unique and stabe (depends continuousy on the data). A probem is i-posed if it is not we-posed. Reguarization theory [2] is a device that was deveoped to turn an i-posed probem into a we-posed one. The crucia step is to repace the functiona in (22) by the foowing reguarized functiona (f(x i ) y i ) 2 + λ f 2 K, f H, (23) where λ is a reguarization term, and f 2 K the norm of f in L 2 ρ(x). Now minimizing (23) makes it we-posed and can be soved by eementary inear agebra. Let f λ,z be the minimizer of (23), the question then is: How good an approximation is f λ,z to f ρ, or how sma is ε(f λ,z ) = X (f λ,z(x) f ρ (x)) 2 dρ X? Further, what is the best choice for λ to minimize this error? In [5], the answers to these questions are given. They state: For each m N and δ [0, ), there is a function E m,δ = E : R + R, such that, for a λ > 0, X (f λ,z(x) f ρ (x)) 2 dρ X E(λ) with confidence δ. And there is a unique minimizer of E(λ) that is found by a simpe agorithm to yied the best reguarization parameter λ = λ. References [] M. Bartett. Further aspects of the theory of mutipe regression. In Proceedings of the Cambridge Phiosophica Society, number 34, 938. [2] V. Behumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: Recognition using cass specific inear projection. IEEE Trans. Pattern Anaysis and Machine Inteigence, 9(7):7 720, 997. [3] L. Chen, H. M. Liao, M.T.Ko, J. Lin, and G. Yu. A new da-based face recognition system which can sove the sma sampe size probem. Pattern Recognition, 33:73 726, 200. [4] F. Cucker and S. Smae. On the mathematica foundations of earning. Buetin of the American Mathematica Society, 39(): 49, 200. [5] F. Cucker and S. Smae. Best choices for reguarization parameters in earning theory: On the bias-variance probem. Foundations Comput. Math., (4):43 428, [6] R. O. Duda and P. E. Hart. Pattern Cassification and Scene Anaysis. John Wiey-Sons, New York, edition, 973. [7] T. Evgeniou, M. Ponti, and T. Poggio. Reguarization networks and support vector machines. Advances in Computationa Mathematics, 3(): 50, [8] R. Fisher. The use of mutipe measurements in taxonomic probems. Ann. Eugenics, 7:78 88, 936. [9] J. H. Friedman. Reguarized discriminant anaysis. Journa of the American Statistica Association, 84(405):65 75, 989. [0] K. Fukunaga. Introduction to statistica pattern recognition. Academic Press, 990. [] G. Fung and O. L. Mangasarian. Proxima support vector machine cassifiers. In F. Provost and R. Srikant, editors, Proceedings KDD-200: Knowedge Discovery and Data Mining, August 26-29, 200, San Francisco, CA, pages 77 86, New York, 200. [2] A. Hoer and R. Kennard. Ridge regression: Biased estimation for nonorthogona probems. Technometrics, 2(3):55 67, 970. [3] A. K. J. Hertz and R. Pamer. Introduction to the Theory of Neura Computation. Addison Wesey, 99. [4] J.Yang, A.F.Frangi, J.-Y. Yang, D. Zhang, and Z. Jin. Kpca pus da: a compete kerne fisher discriminant framework for feature extraction and recognition. IEEE Trans. on Pattern Anaysis and Machine Inteigence, 27: , [5] K. Liu, Y. Cheng, and J. Yang. A generaized optima set of discriminant vectors. Pattern Recognition, 25(7):73 739, 992. [6] J. Lu, K. N. Pataniotis, and A. N. Venetsanopouos. Face recognition using kerne direct discriminant anaysis agorithms. IEEE Transactions on Neura Networks, 4():7 26, [7] S. Mika. Kerne Fisher Discriminants. PhD thesis, University of Technoogy, Berin, October [8] T. Poggio and S. Smae. The mathematics of earning: Deaing with data. Notices of the American Mathematica Society, 50(5): , [9] J. A. K. Suykens, T. V. Geste, J. D. Brabanter, B. D. Moor, and J. Vandewae. Least Squares Support Vector Machines. Word Scientific Pub. Co., Singapore, [20] D. Swets and J. Weng. Using discriminant eigenfeatures for image retrieva. IEEE Trans. Pattern Anaysis and Machine Inteigence, 8(8):83 836, 996. [2] A. N. Tikhonov and V. Y. Arsenin. Soutions of I-posed probems. John Wiey and Sons, Washington D.C., 977. [22] V. Vapnik. Statistica Learning Theory. Wiey, New York, 998. [23] H. Yu and J. Yang. A direct da agorithm for highdimension data with appication to face recognition. Pattern Recognition, 34: , 200. [24] P. Zhang and J. Peng. Svm vs reguarized east squares cassification. In Proceedings of IEEE Internationa Conference on Pattern Recognition, voume, pages 76 79, 2004.
Statistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationMoreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationMultilayer Kerceptron
Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationA unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationFirst-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries
c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische
More informationAppendix A: MATLAB commands for neural networks
Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationBALANCING REGULAR MATRIX PENCILS
BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationA Ridgelet Kernel Regression Model using Genetic Algorithm
A Ridgeet Kerne Regression Mode using Genetic Agorithm Shuyuan Yang, Min Wang, Licheng Jiao * Institute of Inteigence Information Processing, Department of Eectrica Engineering Xidian University Xi an,
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationCryptanalysis of PKP: A New Approach
Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationKernel pea and De-Noising in Feature Spaces
Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,
More informationSVM-based Supervised and Unsupervised Classification Schemes
SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro
More informationAppendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model
Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this
More informationConvergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems
Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationDeakin Research Online
Deakin Research Onine This is the pubished version: Rana, Santu, Liu, Wanquan, Lazarescu, Mihai and Venkatesh, Svetha 2008, Recognising faces in unseen modes : a tensor based approach, in CVPR 2008 : Proceedings
More informationUniversal Consistency of Multi-Class Support Vector Classification
Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the
More informationParagraph Topic Classification
Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA
More informationStatistics for Applications. Chapter 7: Regression 1/43
Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationTwo-Stage Least Squares as Minimum Distance
Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom Two-Stage
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationSupport Vector Machine and Its Application to Regression and Classification
BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views
More informationAvailable online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)
Avaiabe onine at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 50-1 (2017 3412 3417 Stabiization of discrete-time switched inear systems: Lyapunov-Metzer inequaities versus S-procedure characterizations
More informationAn approximate method for solving the inverse scattering problem with fixed-energy data
J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationFitting Algorithms for MMPP ATM Traffic Models
Fitting Agorithms for PP AT Traffic odes A. Nogueira, P. Savador, R. Vaadas University of Aveiro / Institute of Teecommunications, 38-93 Aveiro, Portuga; e-mai: (nogueira, savador, rv)@av.it.pt ABSTRACT
More informationAlberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain
CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu
More informationBDD-Based Analysis of Gapped q-gram Filters
BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationAsynchronous Control for Coupled Markov Decision Systems
INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationStochastic Variational Inference with Gradient Linearization
Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,
More informationSUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia
More informationDetermining The Degree of Generalization Using An Incremental Learning Algorithm
Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationA Solution to the 4-bit Parity Problem with a Single Quaternary Neuron
Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria
More informationMaximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR
1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,
More informationNonlinear Gaussian Filtering via Radial Basis Function Approximation
51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This
More informationKernel Matching Pursuit
Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique
More informationConvolutional Networks 2: Training, deep convolutional networks
Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationActive Learning & Experimental Design
Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection
More informationhttps://doi.org/ /epjconf/
HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department
More informationIntegrating Factor Methods as Exponential Integrators
Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been
More informationPreconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Preconditioned Locay Harmonic Residua Method for Computing Interior Eigenpairs of Certain Casses of Hermitian Matrices Vecharynski, E.; Knyazev,
More informationFORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS
FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com
More informationMATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract
MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casea Biometrics Unit, Corne University, Ithaca, N.Y. BU-732-Mf March 98 Abstract Most of the research concerning ridge regression methods has deat with
More informationDo Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix
VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix
More informationSoft Clustering on Graphs
Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de
More informationT.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA
ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationMulticategory Classification by Support Vector Machines
Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department
More informationHigher dimensional PDEs and multidimensional eigenvalue problems
Higher dimensiona PEs and mutidimensiona eigenvaue probems 1 Probems with three independent variabes Consider the prototypica equations u t = u (iffusion) u tt = u (W ave) u zz = u (Lapace) where u = u
More informationFormulas for Angular-Momentum Barrier Factors Version II
BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A
More informationSTABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION
Journa of Sound and Vibration (996) 98(5), 643 65 STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM G. ERDOS AND T. SINGH Department of Mechanica and Aerospace Engineering, SUNY at Buffao,
More informationSchool of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY
The ogic of Booean matrices C. R. Edwards Schoo of Eectrica Engineering, Universit of Bath, Caverton Down, Bath BA2 7AY A Booean matrix agebra is described which enabes man ogica functions to be manipuated
More informationProblem set 6 The Perron Frobenius theorem.
Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator
More informationStochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract
Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More informationSchool of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY
The ogic of Booean matrices C. R. Edwards Schoo of Eectrica Engineering, Universit of Bath, Caverton Down, Bath BA2 7AY A Booean matrix agebra is described which enabes man ogica functions to be manipuated
More informationSparse Semi-supervised Learning Using Conjugate Functions
Journa of Machine Learning Research (200) 2423-2455 Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East
More informationSome Properties of Regularized Kernel Methods
Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di
More informationBP neural network-based sports performance prediction model applied research
Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied
More informationResearch Article Numerical Range of Two Operators in Semi-Inner Product Spaces
Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda
More informationTracking Control of Multiple Mobile Robots
Proceedings of the 2001 IEEE Internationa Conference on Robotics & Automation Seou, Korea May 21-26, 2001 Tracking Contro of Mutipe Mobie Robots A Case Study of Inter-Robot Coision-Free Probem Jurachart
More informationCombining reaction kinetics to the multi-phase Gibbs energy calculation
7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation
More informationApproach to Identifying Raindrop Vibration Signal Detected by Optical Fiber
Sensors & Transducers, o. 6, Issue, December 3, pp. 85-9 Sensors & Transducers 3 by IFSA http://www.sensorsporta.com Approach to Identifying Raindrop ibration Signa Detected by Optica Fiber ongquan QU,
More informationIdentification of macro and micro parameters in solidification model
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vo. 55, No. 1, 27 Identification of macro and micro parameters in soidification mode B. MOCHNACKI 1 and E. MAJCHRZAK 2,1 1 Czestochowa University
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationarxiv: v1 [cs.db] 1 Aug 2012
Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering
More informationVALIDATED CONTINUATION FOR EQUILIBRIA OF PDES
VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized
More informationpp in Backpropagation Convergence Via Deterministic Nonmonotone Perturbed O. L. Mangasarian & M. V. Solodov Madison, WI Abstract
pp 383-390 in Advances in Neura Information Processing Systems 6 J.D. Cowan, G. Tesauro and J. Aspector (eds), Morgan Kaufmann Pubishers, San Francisco, CA, 1994 Backpropagation Convergence Via Deterministic
More informationNonlinear Analysis of Spatial Trusses
Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes
More informationIntroduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.
imuation of the acoustic fied produced by cavities using the Boundary Eement Rayeigh Integra Method () and its appication to a horn oudspeaer. tephen Kirup East Lancashire Institute, Due treet, Bacburn,
More informationDistributed average consensus: Beyond the realm of linearity
Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,
More informationTesting for the Existence of Clusters
Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers
More informationTrainable fusion rules. I. Large sample size case
Neura Networks 19 (2006) 1506 1516 www.esevier.com/ocate/neunet Trainabe fusion rues. I. Large sampe size case Šarūnas Raudys Institute of Mathematics and Informatics, Akademijos 4, Vinius 08633, Lithuania
More informationOn a geometrical approach in contact mechanics
Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128
More informationVALIDATED CONTINUATION FOR EQUILIBRIA OF PDES
SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW
More information