Discriminant Analysis: A Unified Approach

Size: px
Start display at page:

Download "Discriminant Analysis: A Unified Approach"

Transcription

1 Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University Mathematics Department New Oreans, LA 708 riede@math.tuane.edu Abstract Linear discriminant anaysis (LDA) as a dimension reduction method is widey used in data mining and machine earning. It however suffers from the sma sampe size (SSS) probem when data dimensionaity is greater than the sampe size. Many modified methods have been proposed to address some aspect of this difficuty from a particuar viewpoint. A comprehensive framework that provides a compete soution to the SSS probem is sti missing. In this paper, we provide a unified approach to LDA, and investigate the SSS probem in the framework of statistica earning theory. In such a unified approach, our anaysis resuts in a deeper understanding of LDA. We demonstrate that LDA (and its noninear extension) beongs to the same framework where powerfu cassifiers such as support vector machines (SVMs) are formuated. In addition, this approach aows us to estabish an error bound for LDA. Finay our experiments vaidate our theoretica anaysis resuts. Introduction The purpose of this paper is to present a unified theoretica framework for discriminant anaysis. Discriminant anaysis [6, 0] has been successfuy used as a dimensionaity reduction technique for many cassification probems. According to Fisher s criterion, one has to find a projection matrix W that maximizes: J(W ) = W T S b W W T S w W where S b and S w are so-caed between-cass and withincass matrices, respectivey. In practice, the sma sampe size (SSS) probem is often encountered, where S w is singuar. Therefore, the maximization probem can be difficut to sove. To address this issue, the term εi is added, where ε is a sma positive number and I the identity matrix of proper () size. This resuts in maximizing J(W ) = W T S b W W T (S w + εi)w. (2) It can then be soved without any numerica probems. In [9], Friedman discusses reguarized discriminant anaysis with regard to the sma sampe size probem. Equation (2) is a specia case of his reguarized discriminant anaysis in practice. However, Friedman eaves open the question of whether an optima choice for the parameter ε exists, and if so, whether this optima choice is a unique one? Further, Friedman s anaysis is based on traditiona parametric methods in statistics, and Gaussian distributions for underying random variabes are often assumed. That is, his reguarization technique is one of improving the estimation of the covariance matrices that are possiby biased due to the sma data size. The vaidity of this technique can be very imited. For exampe, the noninear case of discriminant anaysis, caed generaized discriminant anaysis (GDA), becomes very popuar in practice and shows promising potentia in many appications. The idea is to map the data into a kerne induced feature space and then perform discriminant anaysis there. However, in a high and possibe infinite dimensiona feature space, GDA is not we justified from Friedman s point of view. Thus, a rigorous justification for the reguarization term εi is sti missing, especiay when GDA is considered. In [7], Evgeniou, Ponti and Poggio present a theoretica framework for the reguarization functiona (the so caed reguarization network) minh[f] = f (y i f(x i )) 2 + λ f 2 K, (3) where f 2 K is a norm in a Reproducing Kerne Hibert Space H defined by the positive definite function K, is the number of data points or exampes and λ is a reguarization parameter. In their paper they justify the above functiona

2 using statistica earning theory that is mosty deveoped by Vapnik [22]. In this paper, we wi show that discriminant anaysis can be cast in the framework of statistica earning theory. More specificay, Fisher s criterion () is equivaent to the reguarization network (3) without reguarization, which is, minh[f] = f (y i f(x i )) 2. And Equation (2) is equivaent to the reguarization network (3) with the parameter ε corresponding to the reguarization term λ. Thus the roe of εi is we justified by statistica earning theory. Further we show that the optima choice of λ exists, and that it is unique. Additionay we estabish an error bound for the function that discriminant anaysis tries to find.. Reated Work The reationship between east squares regression and Fisher s inear discriminant has been we known for a ong time. There is a good review in [6]. Actuay in the semina paper [8], Fisher aready pointed out its connection to the regression soution. In [7], Mika aso provides an extensive review of LDA. Another widey used technique in statistics, canonica correation anaysis, is aso cosey reated to east squares regression []. The reguarized east squares (RLS) methods have been studied for a ong time, under different names. In statistics, ridge regression [2] has been very popuar for soving bady conditioned inear regression probems. After Tikhonov pubished his book [2], it was reaized that ridge regression uses the reguarization term in Tikhonov s sense. In the 980 s, weight decay was proposed to hep prune unimportant neura network connections, and was soon recognized [3] that weight decay is equivaent to ridge regression. Recenty this od method was found essentia in the framework of statistica earning theory, abeed by different names (RLS [8], Reguarized Network [7], east squares SVMs [9], and proxima SVMs []). It is demonstrated that this regression method is fuy comparabe to SVMs when used as a cassifier [8, 24]. Most recenty, an error bound for RLS given a finite sampe data set was deveoped in [5]. Many eements discussed in this paper are scattered throughout the iterature and may ook famiiar to some audience but not to others. The fact is that most are not comprehensive enough for this topic and uncear issues pop up for many. For exampe, many researchers such as Mika [7] appy the reguarization technique to address the SSS probem but they sti foow Friedman s principe that is quite imited. It is our goa to adopt a simpe yet unified approach to making this topic more comprehensive. 2 Discriminant Learning Anaysis In this section, we first review LDA using Fisher s criterion, and then go on to study it from a earning theory point of view, which we ca discriminant earning anaysis (DLA). 2. Linear Discriminant Anaysis In LDA, within-cass, between-cass, and mixture scatter matrices are used to formuate the criteria of cass separabiity. Consider a J-cass probem, where m 0 is the mean vector of a data, and m j is the mean vector of j-th cass data. A within-cass scatter matrix characterizes the scatter of sampes around their respective cass mean vectors, and it is expressed by S w = J j j= (xj i m j)(x j i m j) T, where j is the size of the data in the j th cass. A betweencass scatter matrix characterizes the scatter of the cass means around the mixture mean m 0. It is expressed by S b = J j= j(m j m 0 )(m j m 0 ) T. The mixture scatter matrix is the covariance matrix of a sampes, regardess of their cass assignments, and it is given by S m = (x i m 0 )(x i m 0 ) T = S w + S b. Fisher s criterion is used to find the projection matrix that maximizes (). In order to determine the matrix W that maximizes J(W ), one can sove the generaized eigenvaue probem: S b w i = λ i S w w i. The eigenvectors corresponding to the argest eigenvaues form the coumns of W. For a two cass probem, it can be written in a simper form: S w w = m = m m 2, (4) where m and m 2 are the means of the two casses. 2.2 LDA as Least Squares Function Estimation LDA is a supervised subspace earning probem where we are given a set z = {(x i, y i )} of training exampes drawn i.i.d. from the probabiity space Z = X Y. Here probabiity measure ρ is defined, and x i are the n- dimensiona inputs. We first consider two cass probems where we have y i {, } as cass abes. Without oss of generaity, we assume x has zero mean, i.e., E(x) = 0. Let us consider: f MSE = arg min (y i f(x i )) 2. (5) f This is a mean squared function estimation probem. Here we first consider the hypothesis space H L, the function cass of inear projections. That is, H L = {f f(x) = w T x, x X}. Thus we wi sove w opt = arg min w (y i w T x i ) 2. (6)

3 It turns out that the soution of (6) is the same as the soution obtained by Fisher s criterion. Lemma The inear system derived by the east squares criterion (6) is equivaent to the one derived by Fisher s criterion (), up to a constant, in two cass probems. Proof: We rewrite (6) as: (y i w T x i ) 2 = (y X T w) T (y X T w) = (y T w T X)(y X T w) = 2( m 2 m 2 ) T w + w T S m w. We use the fact that Xy = m 2 m 2. Taking the derivative with respect to w and setting the resut to 0 we obtain S m w = ( m 2 m 2 ). (7) This is equivaent to Equation (4). When S w is fu rank, three equations S m w = ( m 2 m 2 ), S m w = m m 2, and S w w = m m 2 have the same soution w up to a constant factor, given that the overa mean is 0. For detais see Appendix A. 2.3 Interpretation of LDA within Learning Theory When both casses are gaussian distributed, the LDA can be shown optima. However more often in reaity they are not, then it is not cear how good the LDA is. The conventiona view of LDA is imited here. The earning framework, on the other hand, is more genera. There is no need to assume the distribution of the data to discuss the goodness. Instead, the earning approach directy aims to find a good function mapping and the goodness is we defined (see Appendix C and??). In a genera sense, LDA tries to find a transformation f such that the transformed data have a arge cass mean difference and sma within-cass scatters. The equivaence between (6) and () can be understood in another way: one minimizes the mean squared error with respect to each cass mean whie keeping the mean difference constant. We define f ρ as the best function that minimizes the mean squared error. That is f ρ = arg min (y f(x)) 2 dρ. (8) f Z Given a set of data z, one tries to earn the function f z that is as cose as possibe to the optima mapping f ρ. That is, f opt = arg min f Z (f z f ρ ) 2 dρ. However, the probabiity measure ρ is unknown and so is f ρ. Instead, we may z consider (5) that is caed the empirica error (risk) minimization. Because (6) is equivaent to Fisher s criterion, it is cear that Fisher s criterion eads to an empirica error minimization probem. From the statistica earning theory point of view, however, without controing the function norm, soving equation (5) often eads to overfitting data and the soution is not stabe. And when the sampe size is smaer than the dimensionaity, it is an i-posed probem and the soution is not unique. That is where diverse LDA techniques have been deveoped to sove the SSS probem. 2.4 Discriminant Learning Anaysis as Reguarized Least Squares Regression Foowing [7], we instead minimize over a hypothesis space H the foowing reguarized functiona for a fixed positive parameter λ: f DLA = arg min f H (y i f(x i )) 2 + λ f 2 H. (9) Again, we consider the inear projection function space H L. In this case, f 2 = w T w. w opt = arg min w (y i w T x i ) 2 + λw T w. (0) Thus, (y i w T x i ) 2 + λw T w = 2( m 2 m 2 ) T w + w T (S m + λi)w. Taking the derivative with respect to w and setting the resut to 0, we have (S m + λi)w = ( m 2 m 2 ). () In Appendix A, it is shown that () is equivaent to (S w + λi)w = m, (2) and therefore soving it is equivaent to maximizing (2). We ca this approach discriminant earning anaysis. Since for any given positive reguarization term λ, (9) aways has a unique soution, we have Proposition 2 Discriminant earning anaysis is a weposed probem. 2.5 Noninear DLA In the new framework of discriminant earning anaysis, the noninear case is not an extension of the inear case by the kerne trick. In the above anaysis we have adopted the inear hypothesis space H L, whereby we have been ooking for an optimum soution in that space. If we choose a genera Reproducing Kerne Hibert Space (RKHS) as our hypothesis space, we wi obtain noninear function mappings as the subspace. The derivation of DLA in a genera function space is simpy carried out in terms of function

4 anaysis, in a simiar way as in inear space. Here we skip the detais of the derivations. Briefy speaking, for a two cass probem, the noninear DLA soution is found by foowing a simpe agorithm [8] (K + λi)c = y, (3) where K is the Gram matrix of the training data, and K = [k(x i, x j )] where k is the kerne function. y is the vector of cass abes. After soving c = [c,..., c ], any given new data x wi be projected onto the subspace by s = c ik(x i, x). 2.6 Error Bound for DLA By setting the discriminant earning anaysis in the framework of statistica earning theory, we can provide an error bound for f DLA obtained by (9) in the foowing theorem: Theorem 3 Let f ρ be defined by (8), with confidence δ, the error bound for the soution f DLA of (9) is given by: (f DLA f ρ ) 2 dρ X S(λ) + A(λ), (4) where A(λ) and S(λ) are given by λ /2 L 4 k f ρ 2 and 32M 2 (λ+c k ) 2 λ v (, δ). And there exists an unique λ that 2 minimizes S(λ) + A(λ). This decomposition is often caed bias-variance tradeoff. A(λ) is caed the approximation error. Given a hypothesis space, it measures the error between the optima function in it and the true target. L 4 k in A(λ) is simpy an operator. S(λ) bounds the sampe error and is essentiay derived by the big number aw. M and C k are constants and v (, δ) is the unique soution of an equation whose coefficients contain sampe size and confidence parameter δ. Due to its compex nature, we do not provide the detais of the proof, which can be found in [5]. To the best of our knowedge, this is the first known error bound for LDA. 3 Muti-cass DLA We have presented discriminant earning anaysis as an aternative approach to LDA in two cass probems. The data is projected onto ony one dimension that is adequate for the two cass probem. However, LDA is generay used to find a subspace with d dimensions for a mutipe cass probem. In this section we extend DLA to the muti-cass case. For a two cass probem, the function f is a mapping from a vector x to a scaer y R. If we want to find a subspace with d > dimensions, we actuay consider a mapping from vector x to vector y R d. This direct approach has some difficuties. First, statistica earning theory is concerned with ony the first kind of function mapping f : R n R, and cannot be directy extended to f : R n R d. Second, subspace dimensionaity d is not fixed (rather it is often provided by the user). These difficuties make the attempt at directy casting DLA in the mutipe cass case as a east squares estimation probem a very difficut task. Notice that LDA in a mutipe cass probem can be decomposed into two cass probems. In the i-th two cass probem, it treats the i-th cass as one cass and a remaining casses as the second cass. Each binary cass probem is soved first, and after finding a subspaces, PCA is appied to find eigenvectors having the argest eigenvaues. These new eigenvectors are the soution of the origina muti-cass LDA probem (See Appendix B). Thus, DLA can be naturay extended to the muti-cass case. Simpy, in the decomposition step, we repace two cass LDA by two cass DLA. In the inear case, it turns out that the DLA agorithm in the muti-cass case simpy soves S b W = (S w + λi)w Λ. (5) This is equivaent to soving (2). 4 Discussions In this section, we wi anayze the roe of the reguarization term λ. We are especiay interested in comparing the DLA agorithm (5) to various LDA agorithms, such as PCA+LDA [2, 20], scatter-lda [5, 0], newlda [3] and DLDA [23]. These techniques are mosty proposed for soving facia recognition probems where the SSS probem aways occurs. We summarize them briefy: PCA+LDA: Appy PCA to remove the nu space of S w first, then maximize J(W ) = W T S b W W T S w W. Scatter-LDA: Same as PCA+LDA but maximizing J(W ) = W T S b W W T S m W instead. newlda: If S w is fu rank then sove reguar LDA; ese in the nu space of S w, find the eigenvectors of S b with argest eigenvaues. DLDA: Appy PCA to remove the nu space of S b first, then find the eigenvectors of S w corresponding to the smaest eigenvaues. Some researchers aso maximize J(W ) = W T S m W W T S w W, it is equivaent to maximizing J(W ) = W T S b W W T S ww.

5 It shoud be noted that PCA+LDA and Scatter-LDA can be equivaent when S w and S m span the same subspace. However, they are different when S b totay or partiay spans the nu space of S w, thus S w and S m span different subspaces. For facia images the atter case turns out to be more common. In [3], Chen et a. prove that the nu space of S w contains discriminant information. They aso show that Scatter-LDA is not optima in that it fais to distinguish the most discriminant information in the nu space of S w. Thus they proposed the newlda method. However, newlda fe short of making use of any information outside of that nu space. DLDA, on the other hand, discards the nu space of S b. This is very probematic. First, it fais to distinguish the most discriminant information in the nu space of S w, as in Scatter-LDA. Second, the nu space of S b is not competey useess. This can be seen in a two cass case. DLDA simpy chooses m as the subspace. However this can be biased because it ignores S w. It shoud be noted that in [6] there is a miseading caim, where the authors mention that there is no significant difference between newlda and DLDA. This incorrect caim is aso pointed by others [4]. In a word, a these techniques make hard decisions, either discarding a nu space, or ony working in a nu space. In the foowing, we can see that DLA aways works in the whoe space, not making any hard decisions. Instead, it fuses information from both subspaces. DLA can work in the whoe space because it is we-posed. We first consider how LDA is soved. For the moment exists. For symmetric matrix S w there is a matrix U such that U T U = I, S w = UΣU T, and Sw T = UΣ U T, where Σ is a diagona matrix. U T is a transform matrix. Consider the foowing in a two cass LDA et us assume that S w is fu rank, and thus S w w = S w m = UΣ U T m. (6) The between vector m is transformed to a new basis by U T, then mutipied by a constant Σ ii aong each dimension, and finay transformed again by U back to the origin basis. To iustrate this process. We consider a simpe case where the data are in two dimensions. x and x 2 are the origina axis bases, as shown in Figure. x and x 2 are the orthogona axis bases in which S w is[ diagonaized. ] We 2 0 choose S w such that U T S w U = Σ =. Thus [ ] Σ m = 2, and Σ 22 = 2. Let = U T m. Then m m 2 x 2 x 2 LDA DLA Λ 2 =2/3 Λ =/3 m x and m 2 are the projection on x and x 2, respectivey. By mutipying Σ, m = 2 m and m 2 = 2m 2. The LDA soution, Sw m is shown in Figure -(a) and denoted by LDA. Now consider DLA: (S w + λi)w = m. Notice that if matrix U can diagonaize S w then it can diagonax (a) The soution of LDA and DLA x 2 LDA Λ 2 =2/3 x 2 Λ = m, DLDA DLA x newlda (b) The soution of DLA when S w s eigenvaue is zero in one dimension Figure. Iustration and comparison of soutions by LDA, DLA, DLDA and newlda. ize (S w + λi) at the same time: U T (S w + λi)u = U T S w U + λu T U = Σ + λi. Thus, (S w + λi) = U(Σ + λi) U T. To iustrate the effect of the reguarization term, we choose λ = and in Fig., Λ i are the eigenvaues of S w + λi. Now Λ = 3 x, and Λ 2 = 2 3. The vector (S w + λi) m is shown in Fig. -(a) and denoted by DLA. We now consider the case where the SSS probem occurs. We keep Λ 2 =.5 the same but Λ = (that is, the origina eigenvaue of S w is 0 aong this dimension). This case is potted in Fig. -(b). DLA takes advantage of information from both words. However, PCA+LDA wi throw away the dimension x and ony keep the direction x 2. Thus it ignores critica discriminant information. newlda, on the other hand, wi ony keep the dimension x and throw away any discriminant information aong x 2. DLDA simpy chooses m. Thus, when the SSS probem does not occur, DLA has a smoothing effect on the eigenvaues of S w. If the SSS probem does occur, DLA fuses and keeps a baance between the information both in and out of the nu space of S w. This baance is achieved by choosing proper λ. 5 Experiments 5. Feret Facia Images We tested DLA agorithm against PCA+LDA, newlda, DLDA and Scatter-LDA on FERET facia image data ( We extracted 400 images for the experiment, where there are 50 individuas with 8 images from each. The images have been preprocessed and normaized using standard methods from the FERET database. The size of each image is 50 x 30 pixes, with 256 grey eves per pixe.

6 We randomy choose five images per person for training, and the remaining three for testing. Thus the training set has 250 images whie the testing set has 50. Subspaces are cacuated from the training data, and the -NN cassifier (3-NN faied for a the methods due to five images per person in the training data) is used to obtain the accuracy rates after projecting the data onto the subspace. To obtain average performance, each method is repeated 0 times. Thus there are tota,500 testing images for each method. The reguarization term λ is chosen by 0-fod cross-vaidation. The average accuracy rates are shown in Figure 2. The X- axis represents the dimensionaity of the subspace. For each technique, the higher the dimension, the ess discriminant the dimension. For most techniques, the accuracy rates increase quicky in the first 5 dimensions, and then increase sowy with additiona dimensions. DLA is uniformy better than any other agorithms, demonstrating its efficacy. It achieves the highest accuracy rate of newlda performs quite we in these experiments, again demonstrating that the most discriminant information is in the nu space of S w, for the facia recognition task. On the other hand, Scatter-LDA does not perform we at ower dimensiona subspaces. But it eventuay performs better than PCA+LDA, when dimensions are more than 42. A methods achieve their highest accuracy rate with a 49 dimensiona subspace, which is not surprising, for this is a 50-cass probem. It is noticed that the performance of newlda and Scatter-LDA (its tai is not shown in the pot) drops quicky with unnecessary dimensions. accuracy rate DLA newlda DLDA PCA+LDA Scatter LDA Dimensions Figure 2. Comparison of DLA, LDA, newlda, DLDA and Scatter-LDA on the FERET image data. data sets GDA NDLA Better? breast cancer cancer Wisconsin credit heart Ceve heart Hungary ionosphere new thyroid pima gass sonar iris average Tabe. Cassification error rates in subspaces computed by GDA and NDLA, using 3-nn cassifier, on UCI data sets. 5.2 UCI Data Sets In these experiments, we compare noninear LDA (GDA) and noninear DLA (NDLA) in two cass cassification probems. We did not test other noninear techniques such as KDDA [6] due to their compex impementation. The impementation of kerne DLDA is very invoved. We use data sets from the UCI machine earning database. They are a two cassification probems. For each data set, we randomy choose 60% as training and the remaining 40% as testing. We train GDA and NDLA on the training data and obtain projections. We then project both training and test data on the chosen subspace and use the 3-NN cassifier to obtain error rates. Note that for the two cass case, one dimensiona subspace is sufficient. We use the Gaussian kerne k(x, x 2 ) = exp x x 2 2 σ to induce the Reproducing Kerne Hibert Space for both GDA and NDLA. 2 The kerne parameter σ and reguarization term λ were determined through 0-fod cross-vaidation. We repeat the experiments 0 times on each data set to obtain the average error rates. The resuts are shown in Tabe. On 9 data sets out of, NDLA performs better than GDA. Especiay on the breast cancer credit, heart ceve, heart Hungary and gass data, it is significanty better than GDA, with 95% confidence. 6 Summary This paper presents a earning approach, Discriminant Learning Anaysis for effective dimension reduction. Whie discriminant anaysis is formuated as a east square approx-

7 imation probem, the DLA is a reguarized one. Regardess of the data distribution, the approach aows us to aways measure the goodness of mapping function that DLA computes: an error bound for it is estabished. The commony adopted technique: S w + εi is we justified in the framework of statistica earning theory. The existence and uniqueness of the term ε is obtained. The DLA is aso compared to many other competing agorithms and it is demonstrated both theoreticay and experimentay, that DLA is more robust than others. Appendix A In this appendix, we say Aw = c v and Aw = c 2 m are equivaent in the sense that the soution ws are in the same direction, where A is a matrix, c i s are scaers and m is a vector. First we show that S m w = m and S w w = m have the same soution (same set of eigenvectors), where m = m m 2 is the mean difference of the two cass probem. We know that soving S w w = m is equivaent to soving [6] S w S b Φ = ΦΛ (7) where Φ and Λ are the eigenvector and eigenvaue matrices of Sw S b. Since we have S w = S m S b, foowing [0] (pp. 454), (7) can be converted to (S m S b )ΦΛ = S b Φ S m ΦΛ = S b Φ(I + Λ) S m S b Φ = ΦΛ(I + Λ). (8) This shows that Φ is aso the eigenvector matrix of Sm S b, and its eigenvaue matrix is Λ(I + Λ). When the components of Λ and α i are arranged from the argest to the smaest as α... α n, the corresponding components of Λ(I + Λ) α are aso arranged as +α... α n +α n. That is, the t eigenvectors of Sm S b corresponding to the t argest eigenvaues are the same as the t eigenvectors of Sw S b corresponding to the t argest eigenvaues. As a specia case (two cass probems), S m w = m and S w w = m share the same eigenvector soution. Now we show that S m w = ( m 2 m 2 ) and S m w = m m 2 share the same soution aso. Consider that the overa mean m 0 is 0. m 0 = m + 2 m 2 = 0, we have m = 2 m, m 2 = m, and m 2 m 2 = 2 2 m. Thus S m w = ( m 2 m 2 ) becomes S m w = 22 m. With a constant c, the soution of S m w = cm is sti in the same direction aong the mean difference m, and thus is equivaent to soving Sw S b Φ = ΦΛ. It is easy to show that (S m + λi)w = ( m 2 m 2 ) is equivaent to (S w +λi)w = m m 2, simpy repacing S w and S m by S w + +λi and S m + λi in the above anaysis. Appendix B: Muti-Cass LDA In Appendix A, we showed that for a two cass probem, LDA and DLA correspond to soving (4) and (2), respectivey. S m can be written as S m = U m Λ /2 m Λ /2 m Um, T (4) can be written as Λ /2 m Umw T = Λ /2 m Umm T v. Note that v can be treated as a data point. For a muti-cass probem, Assuming S m is fu rank, LDA corresponds to the system S b W = S m W Λ. It can be write as N b V = V Λ, (9) where V = Λ /2 m UmW, T N b = Λ /2 m UmS T b U m Λ /2 m. This is a simpe eigenvaue probem. By soving V, we can compute W by W = U m Λ /2 m V. The eigenvectors in V with the argest eigenvaues have a one-to-one correspondence to the eigenvectors in W with the corresponding argest eigenvaues in the origina equation. Note that S b = J j= j(m j m 0 )(m j m 0 ) T = J j= ( j m j )( j m j ) T and N b is N b = Λ /2 m UmS T b U m Λ /2 m J = ( j Λ /2 m Um T m j )( j Λ /2 m Um T m j ) T. (20) j= If we treat every vector j Λ /2 m Um T m j as a point, then N b is the covariance of these J points. Equation (9) can aso be viewed as a PCA probem on these J points. In the above anaysis if we repace S m by S m + λi, we obtain that DLA corresponds to soving S b W = (S m +λi)w Λ w. Further S b W = (S w + λi)w Λ w. (2) The eigenvectors corresponding to the argest eigenvaues are chosen as the DLA subspace basis. Appendix C: Overview of Statistica Learning Theory The first step in the deveopment of earning theory is the assumption of existence of a probabiity measure ρ on the product space Z = X Y, from which the data are drawn. One way to define the expected error of f is the east squares error: ε(f) = Z (f(x) y)2 dρ for f : X Y. We try to minimize this error by earning. Define f ρ (x) = ydρ(y x), where ρ(y x) is the conditiona probabiity Y measure on Y w.r.t. x. It has been shown [4] that the expected error ε ρ of f ρ represents a ower bound on the error ε(f) for any f. Hence at east in theory, the function f ρ is the idea one and so is often caed the target function. However, since measure ρ is unknown, f ρ is unknown as we.

8 Starting from the training data z = {(x i, y i )} i, statistica earning theory as deveoped by Vapnik [22] buids on the so-caed empirica risk minimization (ERM) induction principe. Given a hypothesis space H, one attempts at minimizing (f z (x i ) y i ) 2, where f z H, (22) However, in genera, without controing the norm of approximation functions, this probem is i-posed. Generay speaking, a probem is caed we-posed if its soution exists, and is unique and stabe (depends continuousy on the data). A probem is i-posed if it is not we-posed. Reguarization theory [2] is a device that was deveoped to turn an i-posed probem into a we-posed one. The crucia step is to repace the functiona in (22) by the foowing reguarized functiona (f(x i ) y i ) 2 + λ f 2 K, f H, (23) where λ is a reguarization term, and f 2 K the norm of f in L 2 ρ(x). Now minimizing (23) makes it we-posed and can be soved by eementary inear agebra. Let f λ,z be the minimizer of (23), the question then is: How good an approximation is f λ,z to f ρ, or how sma is ε(f λ,z ) = X (f λ,z(x) f ρ (x)) 2 dρ X? Further, what is the best choice for λ to minimize this error? In [5], the answers to these questions are given. They state: For each m N and δ [0, ), there is a function E m,δ = E : R + R, such that, for a λ > 0, X (f λ,z(x) f ρ (x)) 2 dρ X E(λ) with confidence δ. And there is a unique minimizer of E(λ) that is found by a simpe agorithm to yied the best reguarization parameter λ = λ. References [] M. Bartett. Further aspects of the theory of mutipe regression. In Proceedings of the Cambridge Phiosophica Society, number 34, 938. [2] V. Behumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: Recognition using cass specific inear projection. IEEE Trans. Pattern Anaysis and Machine Inteigence, 9(7):7 720, 997. [3] L. Chen, H. M. Liao, M.T.Ko, J. Lin, and G. Yu. A new da-based face recognition system which can sove the sma sampe size probem. Pattern Recognition, 33:73 726, 200. [4] F. Cucker and S. Smae. On the mathematica foundations of earning. Buetin of the American Mathematica Society, 39(): 49, 200. [5] F. Cucker and S. Smae. Best choices for reguarization parameters in earning theory: On the bias-variance probem. Foundations Comput. Math., (4):43 428, [6] R. O. Duda and P. E. Hart. Pattern Cassification and Scene Anaysis. John Wiey-Sons, New York, edition, 973. [7] T. Evgeniou, M. Ponti, and T. Poggio. Reguarization networks and support vector machines. Advances in Computationa Mathematics, 3(): 50, [8] R. Fisher. The use of mutipe measurements in taxonomic probems. Ann. Eugenics, 7:78 88, 936. [9] J. H. Friedman. Reguarized discriminant anaysis. Journa of the American Statistica Association, 84(405):65 75, 989. [0] K. Fukunaga. Introduction to statistica pattern recognition. Academic Press, 990. [] G. Fung and O. L. Mangasarian. Proxima support vector machine cassifiers. In F. Provost and R. Srikant, editors, Proceedings KDD-200: Knowedge Discovery and Data Mining, August 26-29, 200, San Francisco, CA, pages 77 86, New York, 200. [2] A. Hoer and R. Kennard. Ridge regression: Biased estimation for nonorthogona probems. Technometrics, 2(3):55 67, 970. [3] A. K. J. Hertz and R. Pamer. Introduction to the Theory of Neura Computation. Addison Wesey, 99. [4] J.Yang, A.F.Frangi, J.-Y. Yang, D. Zhang, and Z. Jin. Kpca pus da: a compete kerne fisher discriminant framework for feature extraction and recognition. IEEE Trans. on Pattern Anaysis and Machine Inteigence, 27: , [5] K. Liu, Y. Cheng, and J. Yang. A generaized optima set of discriminant vectors. Pattern Recognition, 25(7):73 739, 992. [6] J. Lu, K. N. Pataniotis, and A. N. Venetsanopouos. Face recognition using kerne direct discriminant anaysis agorithms. IEEE Transactions on Neura Networks, 4():7 26, [7] S. Mika. Kerne Fisher Discriminants. PhD thesis, University of Technoogy, Berin, October [8] T. Poggio and S. Smae. The mathematics of earning: Deaing with data. Notices of the American Mathematica Society, 50(5): , [9] J. A. K. Suykens, T. V. Geste, J. D. Brabanter, B. D. Moor, and J. Vandewae. Least Squares Support Vector Machines. Word Scientific Pub. Co., Singapore, [20] D. Swets and J. Weng. Using discriminant eigenfeatures for image retrieva. IEEE Trans. Pattern Anaysis and Machine Inteigence, 8(8):83 836, 996. [2] A. N. Tikhonov and V. Y. Arsenin. Soutions of I-posed probems. John Wiey and Sons, Washington D.C., 977. [22] V. Vapnik. Statistica Learning Theory. Wiey, New York, 998. [23] H. Yu and J. Yang. A direct da agorithm for highdimension data with appication to face recognition. Pattern Recognition, 34: , 200. [24] P. Zhang and J. Peng. Svm vs reguarized east squares cassification. In Proceedings of IEEE Internationa Conference on Pattern Recognition, voume, pages 76 79, 2004.

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

A Ridgelet Kernel Regression Model using Genetic Algorithm

A Ridgelet Kernel Regression Model using Genetic Algorithm A Ridgeet Kerne Regression Mode using Genetic Agorithm Shuyuan Yang, Min Wang, Licheng Jiao * Institute of Inteigence Information Processing, Department of Eectrica Engineering Xidian University Xi an,

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Deakin Research Online

Deakin Research Online Deakin Research Onine This is the pubished version: Rana, Santu, Liu, Wanquan, Lazarescu, Mihai and Venkatesh, Svetha 2008, Recognising faces in unseen modes : a tensor based approach, in CVPR 2008 : Proceedings

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Two-Stage Least Squares as Minimum Distance

Two-Stage Least Squares as Minimum Distance Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom Two-Stage

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Available online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)

Available online at   ScienceDirect. IFAC PapersOnLine 50-1 (2017) Avaiabe onine at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 50-1 (2017 3412 3417 Stabiization of discrete-time switched inear systems: Lyapunov-Metzer inequaities versus S-procedure characterizations

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Fitting Algorithms for MMPP ATM Traffic Models

Fitting Algorithms for MMPP ATM Traffic Models Fitting Agorithms for PP AT Traffic odes A. Nogueira, P. Savador, R. Vaadas University of Aveiro / Institute of Teecommunications, 38-93 Aveiro, Portuga; e-mai: (nogueira, savador, rv)@av.it.pt ABSTRACT

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices

Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Preconditioned Locay Harmonic Residua Method for Computing Interior Eigenpairs of Certain Casses of Hermitian Matrices Vecharynski, E.; Knyazev,

More information

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com

More information

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casea Biometrics Unit, Corne University, Ithaca, N.Y. BU-732-Mf March 98 Abstract Most of the research concerning ridge regression methods has deat with

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

Higher dimensional PDEs and multidimensional eigenvalue problems

Higher dimensional PDEs and multidimensional eigenvalue problems Higher dimensiona PEs and mutidimensiona eigenvaue probems 1 Probems with three independent variabes Consider the prototypica equations u t = u (iffusion) u tt = u (W ave) u zz = u (Lapace) where u = u

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION

STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION Journa of Sound and Vibration (996) 98(5), 643 65 STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM G. ERDOS AND T. SINGH Department of Mechanica and Aerospace Engineering, SUNY at Buffao,

More information

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY The ogic of Booean matrices C. R. Edwards Schoo of Eectrica Engineering, Universit of Bath, Caverton Down, Bath BA2 7AY A Booean matrix agebra is described which enabes man ogica functions to be manipuated

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY The ogic of Booean matrices C. R. Edwards Schoo of Eectrica Engineering, Universit of Bath, Caverton Down, Bath BA2 7AY A Booean matrix agebra is described which enabes man ogica functions to be manipuated

More information

Sparse Semi-supervised Learning Using Conjugate Functions

Sparse Semi-supervised Learning Using Conjugate Functions Journa of Machine Learning Research (200) 2423-2455 Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East

More information

Some Properties of Regularized Kernel Methods

Some Properties of Regularized Kernel Methods Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda

More information

Tracking Control of Multiple Mobile Robots

Tracking Control of Multiple Mobile Robots Proceedings of the 2001 IEEE Internationa Conference on Robotics & Automation Seou, Korea May 21-26, 2001 Tracking Contro of Mutipe Mobie Robots A Case Study of Inter-Robot Coision-Free Probem Jurachart

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

Approach to Identifying Raindrop Vibration Signal Detected by Optical Fiber

Approach to Identifying Raindrop Vibration Signal Detected by Optical Fiber Sensors & Transducers, o. 6, Issue, December 3, pp. 85-9 Sensors & Transducers 3 by IFSA http://www.sensorsporta.com Approach to Identifying Raindrop ibration Signa Detected by Optica Fiber ongquan QU,

More information

Identification of macro and micro parameters in solidification model

Identification of macro and micro parameters in solidification model BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vo. 55, No. 1, 27 Identification of macro and micro parameters in soidification mode B. MOCHNACKI 1 and E. MAJCHRZAK 2,1 1 Czestochowa University

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

pp in Backpropagation Convergence Via Deterministic Nonmonotone Perturbed O. L. Mangasarian & M. V. Solodov Madison, WI Abstract

pp in Backpropagation Convergence Via Deterministic Nonmonotone Perturbed O. L. Mangasarian & M. V. Solodov Madison, WI Abstract pp 383-390 in Advances in Neura Information Processing Systems 6 J.D. Cowan, G. Tesauro and J. Aspector (eds), Morgan Kaufmann Pubishers, San Francisco, CA, 1994 Backpropagation Convergence Via Deterministic

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled. imuation of the acoustic fied produced by cavities using the Boundary Eement Rayeigh Integra Method () and its appication to a horn oudspeaer. tephen Kirup East Lancashire Institute, Due treet, Bacburn,

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

Trainable fusion rules. I. Large sample size case

Trainable fusion rules. I. Large sample size case Neura Networks 19 (2006) 1506 1516 www.esevier.com/ocate/neunet Trainabe fusion rues. I. Large sampe size case Šarūnas Raudys Institute of Mathematics and Informatics, Akademijos 4, Vinius 08633, Lithuania

More information

On a geometrical approach in contact mechanics

On a geometrical approach in contact mechanics Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW

More information