arxiv: v1 [stat.ml] 31 Jan 2018

Size: px
Start display at page:

Download "arxiv: v1 [stat.ml] 31 Jan 2018"

Transcription

1 Increental kernel PCA and the Nyströ ethod arxiv: v [stat.ml] 3 Jan 208 Fredrik Hallgren Departent of Statistical Science University College London London WCE 6BT, United Kingdo fredrik.hallgren@ucl.ac.uk Abstract Increental versions of batch algoriths are often desired, for increased tie efficiency in the streaing data setting, or increased eory efficiency in general. In this paper we present a novel algorith for increental kernel PCA, based on rank one updates to the eigendecoposition of the kernel atrix, which is ore coputationally efficient than coparable existing algoriths. We extend our algorith to increental calculation of the Nyströ approxiation to the kernel atrix, the first such algorith proposed. Increental calculation of the Nyströ approxiation leads to further gains in eory efficiency, and allows for epirical evaluation of when a subset of sufficient size has been obtained. INTRODUCTION Kernel ethods ake use of non-linear patterns in data whilst being able to use linear solution ethods, through a non-linear transforation of data exaples into a feature space where inner products correspond to the application of a kernel function between data exaples (Hofann et al., 2008). Many kernel ethods have been conceived as the direct application of well-known linear ethods in this feature space, occasionally reforulated to be expressed entirely in the for of inner products. This is the case for kernel PCA, obtained through the application of linear PCA in feature space (Schölkopf et al., 998) and involving an eigendecoposition of the kernel atrix. It has been shown to outperfor linear PCA in a nuber of applications (Chin and Suter, 2007). Paul Northrop Departent of Statistical Science University College London London WCE 6BT, United Kingdo p.northrop@ucl.ac.uk Increental algoriths, where a solution is updated for additional data exaples, are often desirable. If data arrives sequentially in tie and a solution is required for each additional data exaple, ore efficient increental algoriths are often available than repeated application of a batch procedure. Furtherore, increental algoriths often have a lower eory footprint than their batch counterparts. In this paper, we propose a novel algorith for increental kernel PCA, which accounts for the change in ean in the covariance atrix fro each additional data exaple. It works by writing the expanded ean-adjusted kernel atrix fro an additional data point in ters of a nuber of rank one updates, to which a rank one update algorith for the eigendecoposition can be applied. We use a rank one update algorith based on work in Golub (973) and Bunch et al. (978). A few previous exact increental algoriths for kernel PCA have been proposed, soe of which are based on the application of an increental linear PCA ethod in feature space (Ki et al., 2005; Chin and Suter, 2007; Hoegaerts et al., 2007). Rank one update algoriths for the eigendecoposition have not previously been applied to kernel PCA, to the best of our knowledge. If the ean of the feature vectors is not adjusted, our algorith corresponds to an increental procedure for the eigendecoposition of the kernel atrix, which can be ore widely applied. Our algorith has the sae tie and eory coplexities as existing algoriths for increental kernel PCA and it is ore coputationally efficient than the coparable algorith in Chin and Suter (2007), which also allows for a change in ean. Furtherore, it can be considered ore flexible, since it is straightforward to apply a different rank one update algorith to the one we have used, for potentially iproved efficiency. Approxiate algoriths could also be applied, for exaple fro randoized linear algebra (Mahoney, 20). The usefulness of kernel ethods is liited by their large coputational requireents in tie and eory, which

2 scale in the nuber of data points, since the diension of the transfored variables often is very large, or they are not explicitly available, and one therefore ust express a solution in ters of transfored data exaples. This is particularly true for kernel PCA since it requires an eigendecoposition of the kernel atrix, an expensive operation. As a reedy, various approxiate ethods have been introduced, such as the Nyströ ethod (Willias and Seeger, 200), which creates a low-rank approxiation to the kernel atrix based on a randoly sapled subset of data exaples. We also extend our algorith for increental kernel PCA to increental calculation of the Nyströ approxiation to the kernel atrix. We increentally add data exaples to the subset used to create the Nyströ approxiation to kernel PCA. This allows one to evaluate epirically the accuracy of the Nyströ approxiation for each added data exaple. Rudi et al. (205) presented an increental updating procedure for the Nyströ approxiation to kernel ridge regression, based on rank one updates to the Cholesky decoposition. Our proposed increental procedure can be applied to any kernel ethod requiring the eigendecoposition or inverse of the kernel atrix. Cobining an increental algorith with the Nyströ ethod also leads to further iproveents in eory efficiency, copared with either ethod on its own. 2 BACKGROUND 2. KERNEL METHODS Kernel ethods allow for the application of linear ethods to discover non-linear patterns between variables, through a non-linear transforation of data points φ(x) into a feature space where linear algoriths can be applied (Hofann et al., 2008). They rely on two things. First, the calculation of inner products between transfored data exaples through a syetric positive definite kernel k(x, y); second, the expression of a solution linearly in the space of transfored data exaples, rather than in the space of transfored variables. We have a set of n observations {x i } n i=. Linear ethods generally scale in the diension of the observations. For exaple, if each x i is a real vector x i = (x () i,x (2) i,..., x (d) i ), a linear ethod will scale as the nuber of variables d. Let each x i be an eleent fro a set X. In general, no further restrictions need to be placed on the set X, which is a great benefit of kernel ethods. For exaple, X can be a collection of text strings or graphs (Lodhi et al., 2002; Vishwanathan et al., 200). Let H be a Hilbert space of real-valued functions on X, with inner product, H. If X is a vector space, then H is a closed subspace of X, the dual space of bounded linear functionals on X. Consider H, the dual space of linear functionals on H. For each x X there is an eleent δ x H such that δ x (f)=f(x), tered the evaluation functional. If δ x is bounded (i.e. continuous), then by the Riesz representation theore there is a unique eleent g x H such that δ x (f)= g x,f H (Bollobás, 999). If we consider g x as a function of x, say k(x, ), then k(x, ) has the reproducing property, i.e. k(x, ),f( ) H = f(x). Furtherore, by the reproducing property, we have k(x, ),k(y, ) H =k(x,y). Then k(x,y) is a syetric positive definite function by the syetric positive definite property of the inner product. The function k(x, ) is also often denoted by φ(x), tered a feature ap. The space H has uncountable diension, but since every (separable) Hilbert space is isoetrically isoorphic to l 2, the space of square-suable sequences (Bollobás, 999), each eleent φ(x i ) has a representation as a vector φ(x i )=(φ (x i ),φ 2 (x i ),...,φ d (x i )) over R with φ(x i ),φ(x j ) H = d k= φ k(x i )φ k (x j ). We call these feature vectors. However, this representation is often not known, or d is very large, so it ight not be possible to apply a linear ethod directly on the variables φ (x),φ 2 (x),...,φ d (x). Thanks to the representer theore (Schölkopf et al., 200), a solution can instead often be expressed in ters of eleents in H, as f(x)= n i= α ik(x i,x) with coefficients α i. We arrange the feature vectors along the rows of a data atrix Φ. The kernel atrix is given by K := (k(x i,x j )) R n n =ΦΦ T. 2.2 KERNEL PCA PCA finds the set of orthogonal linear cobinations of variables that axiizes the variance of each linear cobination in turn. PCA can be used for diensionality reduction, in regression and classification probles, and to detect outliers, aong other applications (Jolliffe, 2002). The principal coponents are obtained by calculating the eigendecoposition of the saple covariance atrix C = n XT X, for a data atrix of (centred) observations X, where each observation occupies a row. This gives the decoposition C =V ΛV T where the coluns of V are the directions of axiu variance. The principal coponents can also be obtained through the related singular value decoposition (SVD). Assuing centered data, kernel PCA perfors the eigen-

3 decoposition of the covariance atrix in feature space through (Schölkopf et al., 998) n ΦT Φv =λv resulting in the decoposition n ΦT Φ=V ΣV T. Henceforth we will ignore the factor n and only be concerned with the eigendecoposition of Φ T Φ. Noting that span{φ T }=span{v }, we can write v in ters of an n-diensinal vector u as v =Φ T u. Left-ultiplying the eigenvalue equation by Φ we obtain Ku=λu and the decoposition K =UΛU T. If the data vectors in feature space are not assued to be centred, we need to subtract the ean of each variable fro Φ and instead calculate the eigendecoposition of K =(Φ n Φ)(Φ n Φ) T =K n K K n + n K n () where n is an n n atrix for which ( n ) i,j = n, i.e. with every eleent equal to n. 2.3 INCREMENTAL KERNEL PCA Increental algoriths update an existing solution for one or several additional data exaples, also referred to as online learning. The goal is that specialized algoriths will achieve greater tie or eory perforance than repeated application of batch procedures. There are any use cases for increental versions of batch algoriths, for exaple when eory capacity is constrained, or when data exaples arrive sequentially in tie, tered streaing data, and a solution is desired for each additional data exaple. A few algoriths for exact increental kernel PCA have been proposed. The algorith in Chin and Suter (2007) is based on the increental linear PCA algorith fro Li et al. (2004). The tie coplexity is O(n 3 ) and the eory coplexity O(n 2 ). Hoegaerts et al. (2007) write the kernel atrix expanded with an additional data exaple in ters of two rank one updates, without adjusting for a change in ean, and hence propose an algorith to update a subset of doinant eigenvalues and corresponding eigenvectors. If the algorith is applied to update all eigenpairs, the coplexities in tie and eory are O(n 3 ) and O(n 2 ), respectively. Iterative algoriths produce a sequence of iproving approxiate solutions that converges to the exact solution as the nuber of steps increases (Golub and Van Loan, 203). An iterative algorith can often be ade to operate efficiently in an increental fashion, by expanding the data set with additional data exaples and restarting the iterative procedure. An exaple of an iterative ethod for kernel PCA that can be ade to operate increentally is the kernel Hebbian algorith (Ki et al., 2005), based on the generalized Hebbian algorith (Oja, 982) applied in feature space. Various approxiations to increental kernel PCA have also been proposed. See for exaple Tokuoto and Ozawa (20) or Sheikholeslai et al. (205). Since we present an exact algorith for increental kernel PCA, we will not describe these or siilar works further. 2.4 THE NYSTRÖM METHOD The Nyströ ethod (Willias and Seeger, 200) randoly saples data exaples fro the full dataset, often uniforly, and calculates a low-rank approxiation K to the full kernel atrix through K =K n, K,K,n where K n, is an n atrix obtained by choosing coluns fro the original atrix K, K,n is its transpose and K, contains the intersection of the sae coluns and rows. 3 KERNEL PCA THROUGH RANK ONE UPDATES In this section we present an algorith for increental kernel PCA based on rank one updates to the eigendecoposition of the kernel atrix K, or the eanadjusted kernel atrix K. Any increental algorith for the eigendecoposition of the kernel atrix K can be applied where the explicit or iplicit inverse of the sae is required, such as kernel regression and kernel SVM. Various ethods other than kernel PCA are also based on the eigendecoposition of the kernel atrix, such as kernel FDA (Mika et al., 999). Even when ore efficient solution ethods are available, access to the eigendecoposition can be highly useful for statistical regularization or controlling nuerical stability. In contrast to the covariance atrix in linear PCA, the kernel atrix expands in size for each additional data point, which needs to be taken into account, and the effect on the eigensyste deterined. We write the kernel atrix K +,+ created with + data exaples in ters of an expansion and a sequence of syetric rank one updates to the kernel atrix K,, and apply a rank one update algorith to the eigendecoposition of K, to obtain the eigendecoposition of K +,+.

4 A nuber of algoriths have been suggested to perfor rank one odification to the syetric eigenproble. Golub (973) presented a procedure to deterine the eigenvalues of a diagonal atrix updated through a rank one perturbation. Bunch et al. (978) extended the results to the deterination of both eigenvalues and eigenvectors of an arbitrary perturbed atrix, including an iproved procedure to deterine the eigenvalues. Stability issues in the calculation of the eigenvectors, including loss of nuerical orthogonality, later otivated several iproveents (Dongarra and Sorensen, 987; Sorensen and Tang, 99; Gu and Eisenstat, 994). Alternatively, one could potentially eploy update algoriths for the singular value decoposition, such as the algorith suggested in Brand (2006) for the thin singular value decoposition. We use the rank one update algorith for eigenvalues fro Golub (973) and the deterine the eigenvectors according to Bunch et al. (978). In the experients our approach sees to be sufficiently stable and accurate for ost use cases. We assue throughout that the kernel atrix reains non-singular after each update. Our algorith has the sae tie and eory coplexities as copeting ethods. The algorith ost coparable to ours is the one in Chin and Suter (2007), which also accounts for a change in ean. If one additional data exaple is added increentally, and all eigenpairs are retained, it requires the eigendecoposition of an atrix, the eigendecoposition of the unadjusted kernel atrix, and a ultiplication of two atrices at each step. Since a ultiplication of two atrices requires 2 3 flops, and the stateof-the-art QR algorith for the syetric eigenproble about 9 3 flops (Golub and Van Loan, 203), the algorith thus requires 20 3 flops to the O( 3 ) factor. Our proposed algorith requires 8 3 flops to the O( 3 ) factor if the ean is adjusted, and 4 3 flops otherwise, fro one ultiplication of two + + atrices for each rank one update. Our algorith is thus ore than twice as efficient. 3. RANK ONE UPDATE PROCEDURE If we know the eigendecoposition of K,= U Λ U T and write K +,+ in ters of an expansion and nuber of syetric rank one updates to K,, we can then apply a rank one update algorith to obtain the eigendecoposition of K +,+= U + Λ + U T Zero-ean data If we assue that the data exaples have zero ean in feature space, then the ean does not need to be updated for previous data points and K, only needs to be expanded with an additional row and colun. In this case we can devise a rank one update procedure fro K, to K +,+ in two steps. We denote k i,j =k(x i,x j ) and a=[k,+ k 2,+ k,+ ] T, i.e. a colun vector with eleents k,+, k 2,+,..., k,+ and let Then we have v =[a T v 2 =[a T 2 k +,+] T 4 k +,+] T σ=4/k +,+ K +,+ = [ K, 0 = 0 T 4 k +,+ :=K 0,+σv v T σv 2 v T 2 ] +σv v T σv 2 v2 T corresponding to an expansion of K, to K, 0 and two rank one updates, where 0 is a colun vector of zeros. Copared to the eigensyste of K,, K, 0 will have an additional eigenvalue λ + = 4 k +,+ and corresponding eigenvector u + =[0 0 0 ] T. The atrix K, 0 is syetric positive definite (SPSD), since all eigenvalues are positive. It will reain SPSD after the first update, since it is a su of two SPSD atrices, as v v T is a Gra atrix, if each eleent is instead seen as a separate vector. The resulting atrix after the second update will be SPSD since this holds for K +,+. The algorith for one updating iteration is described in Algorith, given a function rankoneupdate(σ,v,l,u) that updates the eigenvalues L and eigenvectors U fro a rank one additive perturbation σvv T. (2) Algorith Increental eigendecoposition of kernel atrix Input: Dataset {x i} + i= ; row vector of eigenvalues L and atrix of eigenvectors U of K,; kernel function k(, ) Output: Eigenvalues L and eigenvectors U of K +,+ : L [L [ k +,+/4] ] U 0 2: U 0 k +,+/4 3: siga 4/k +,+ 4: k [k,+ k 2,+... k +,+/2] 5: k0 [k,+ k 2,+... k +,+/4] 6: L,U rankoneupdate(siga, k, L, U) 7: L,U rankoneupdate( siga, k0, L, U)

5 If we liit ourselves to kernel functions for which k(x,x) is constant, without loss of generality we can set k(x,x)= and the above expression siplifies Mean-adjusted data To construct a rank one update procedure fro K, to K +,+, all the eleents of K, need to be adjusted in addition to the expansion with another row and colun. We first devise two rank one updates that adjust the ean of K, to account for the additonal data exaple. We then expand the resulting atrix and perfor syetric updates to set the last row and colun to the required values, siilarly to (2). Recall that when taking the ean into account, one perfors an eigendecoposition of the adjusted kernel atrix K =K n K+K n n K n. The eleents of K, can thus be adjusted through the following forula K,:=(K +,+) :,: =K,+ K, +K, K, +( + K +,+ K +, K +,+ + ) :,: where ( ) :,: denotes the first rows and coluns of a atrix. The latter six ters are all rank one atrices. The atrices K, and ( + K +,+ ) :,: are constant along the coluns, and hence their su, and siilarly for the rows of K, (K +,+ + ) :,:. The atrix K, has constant entries, equal to the su of all eleents of K, ultiplied by a factor / 2, and siilarly for ( + K +,+ + ) :,:. Consequently, all ters can be written as two rank one updates. We have K, ( + K +,+ ) :,: = + ( K, a T ) K, (K +,+ + ) :,: = + (K, a T ) with a as in section 3.. above and where is a colun vector of ones. Since K, is syetric for all, we have K, =(K, ) T and ( + K +,+ ) :,: =(K +,+ + ) T :,:, and can set u= (+) K, + a+ 2 C C= 2 Σ + (+) 2 Σ + where we have denoted Σ = T K,, the su of all eleents of K,, to obtain K,=K,+ u T +u T =K,+ 2 ( +u)( +u) T 2 ( u)( u) T which is two syetric rank one updates to K,. Σ and K, can easily be updated between iterations like so Σ + =Σ +2a T +k +,+ K +,+ + =[K, +a; a T +k +,+ ] where [b; c] denotes a colun vector b expanded with an additional eleent c. We now expand K, to K +,+, analogously to (2), but taking the adjusted ean into account. The required last row and colun is given by v := k + ( + T +k+k +,+ + with k=[a T k(x +,x + )] T. If we let v =[(v) : ; 2 (v) +] v 2 =[(v) : ; 4 (v) +] σ=4/(v) + + Σ + + ) where (v) : is a vector of the first eleents of v, and (v) + is its last eleent, we have [ ] K K +,+=, 0 +σv v T σv 2 v2 T 0 T 4 (v) + :=K 0,+σv v T σv 2 v T 2 (3) We have thus devised a procedure to update K, to K +,+ using four syetric rank one updates, for which a rank one eigendecoposition update algorith can be applied. The full procedure is described in Algorith 2. Note that the atrix K, or its expansion do not need to be kept in eory. The procedure is linear in tie and eory, since all constituent quantities are updated increentally. 3.2 UPDATE ALGORITHM FOR THE EIGENDECOMPOSITION Here we describe an algorith for updating the eigendecoposition after a rank one perturbation. Suppose

6 Algorith 2 Increental eigendecoposition of adjusted kernel atrix Input: Dataset {x i} + i= ; row vector of eigenvalues L and atrix of eigenvectors U of K,; kernel function k(, ); su of all eleents of K,, denoted S; su of rows of K,, i.e. K,, denoted K Output: Eigenvalues L and eigenvectors U of K +,+ : a [k,+ k 2,+... k,+] 2: S2 S+2 su(a)+k +,+ 3: C S/ 2 +S2/(+) 2 4: u K/( (+)) 2 a/(+)+0.5 C ones() 5: L,U rankoneupdate(0.5, +u, L, U) 6: L,U rankoneupdate( 0.5, u, L, U) 7: K [K+a su(a)+k] 8: S S2 9: + 0: v k (ones() (su(a)+k)+k S/)/ : v0 v[] 2: v v[: ] 3: L [L [ v0/4] ] U 0 4: U 0 v0/4 5: siga 4/v0 6: v [v v0/2] 7: v2 [v v0/4] 8: L,U rankoneupdate(siga, v, L, U) 9: L,U rankoneupdate( siga, v2, L, U) we know the eigendecoposition of a syetric atrix A=UΛU T. Let B=UΛU T +σvv T =U(Λ+σzz T )U T where z=u T v, and look for the eigendecoposition of B=Λ+σzz T :=Ũ ΛŨ T (Bunch et al., 978). Then the eigendecoposition of B is given by UŨ ΛŨ T U T with unchanged eigenvalues and eigenvectors U B := UŨ, since the product of two orthogonal atrices is orthogonal and since the eigendecoposition is unique, provided all eigenvalues are distinct. The eigenvalues of B can be calculated in O(n 2 ) tie by finding the roots of the secular equation (Golub, 973) n zi 2 ω( λ):=+σ (4) λ i λ i= The eigenvalues of the odified syste are subject to the following bounds λ i λ i λ i+ λ n λ n λ n +σz T z λ i λ i λ i λ +σz T z λ λ i=,2,...,n, σ>0 σ>0 i=2,3,...,n, σ<0 σ<0 (5) which can be used to supply initial guesses for the root finding algorith. Note that after expanding the eigensyste, as described above, the eigenpairs need to be reordered for the bounds to be valid. Once the updated eigenvalues have been calculated the eigenvectors of the perturbed atrix B are given by (Bunch et al., 978) u B i = UD i z z D i where D i :=Λ λ i I. Since U and D i are and D i is diagonal the denoinator is O() and the nuerator is O( 2 ), leading to O( 3 ) tie coplexity to update all eigenvectors. The nuber of flops for the full procedure is 2n 3 +O(n 2 ). Equation (6) requires the creation of an additional n n atrix, hence the full procedure is quadratic in eory. 4 INCREMENTAL NYSTRÖM (6) In this section we extend our proposed algorith to increental calculation of the Nyströ approxiation to the kernel atrix. Having access to an increental procedure for the Nyströ ethod can be highly useful. Different sizes of subsets used in the approxiation can efficiently be evaluated, to deterine a suitable size for the proble at hand or for epirical investigation of the characteristics of the Nyströ ethod for subsets of different sizes. For very large datasets, the cobination of the Nyströ ethod with increental calculation results in further gains in eory efficiency. Rudi et al. (205) previously proposed an increental algorith for the Nyströ approxiation applied to kernel ridge regression, based on rank one updates to the Cholesky decoposition. Our proposed procedure can be seen as a generalization of their work. To the best of our knowledge, it is the first increental algorith for calculation of the full Nyströ approxiation to the kernel atrix. Given the eigenvalues Λ and eigenvectors U of the atrix K,, the corresponding approxiate eigenvalues and eigenvectors of K are given by (Willias and Seeger, 200) Λ nys := n Λ (7) U nys := n K n,uλ To obtain an increental procedure for K = U nys Λ nys U nyst, calculate U and Λ increentally using Algorith (2), then at each iteration add an extra colun to K n, corresponding to the additional data exaple, and calculate the rescaling (7). The rescaling

7 Nor Frobenius trace spectral Frobenius ean trace ean spectral ean agic Nor Frobenius trace spectral Frobenius ean trace ean spectral ean yeast Figure : Difference between batch and increental calculation of K of size 20+ for the two datasets. has O( 2 n) tie coplexity fro the atrix product in (7). Note that the proposed increental calculation of the Nyströ approxiation exactly reproduces batch coputation at each, save for nuerical differences. The accuracy of the Nyströ approxiation has been extensively studied, including coparisons with other ethods (Gittens and Mahoney, 206; Yang et al., 202). 5 EXPERIMENTAL ANALYSIS In this section we present the results of a nuber of experients. We run the experients on two different datasets fro the UCI Machine Learning Repository (Lichan, 203), the siulated Magic gaa telescope dataset and the Yeast dataset, containing cellular protein location sites. Where applicable, we reove the target variable when this is categorical and not continuous. Throughout the experients we use the radial basis functions kernel ( ) k(x,y)=exp x y 2 2 σ where σ is a paraeter. For each dataset, we set σ to be the edian of the distances between all pairs of data exaples (in a subset of the full dataset), a coon heuristic. Source code in Python is available at 5. INCREMENTAL KERNEL PCA We ipleent and evaluate our algorith for increental kernel PCA both with and without adjustent of the ean of the feature vectors. Nuerical accuracy is generally good, whether adjusting the ean or not. A slight loss of orthogonality is discovered in the eigenvectors, as easured by how close UU T is to the identity, particularly for ean-adjusted data that requires four updates at each step and involves ore nuerical operations. We have previously assued that the kernel atrix reains of full rank after each added data exaple. This will always be the case in theory if data contains noise, however near nuerical rank deficiency can cause issues in practice. Equation (4) ay then lack the required nuber of roots. In this instance one can deflate the atrix (see e.g. Bunch et al. (978) for details), but for the purposes of our experients we have contended with excluding the specific data exaple fro the algorith. An excluded data point does not add any tie overhead to the O(n 3 ) factor. Every nuerical operation leads to a sall loss in accuracy, due to the finite representation of floating-point nubers, which is propagated, with varying severity, over subsequent operations. An increental procedure involves substantionally ore operations than a batch procedure, which leads to worse accuracy in coparison, often tered drift. We illustrate this by plotting the Frobenius, spectral and trace nors of the difference between the adjusted kernel atrix K, and the reconstruction using the increentally calculated eigendecoposition, for different nubers of data points, i.e. K, U Λ U T. We plot the difference for

8 Nor agic Frobenius trace spectral Frobenius ean trace ean spectral ean Nor yeast Frobenius trace spectral Frobenius ean trace ean spectral ean Figure 2: Difference between K and K of size 20+ for the two datasets. one run of the algorith as well as the ean difference for each value of over 50 runs. Please see Figure. The drift for reconstruction of the unadjusted atrix is saller and is not plotted. Our results show that the drift is sall. 5.2 INCREMENTAL NYSTRÖM We ipleent the proposed increental calculation of the Nyströ approxiation, using the first 000 observations fro each dataset. Having access to an increental algorith for calculating the Nyströ approxiation lets us investigate explicitly how the approxiation iproves with each additional data point for a specific data set. We calculate the Frobenius nor, spectral nor and trace nor of the difference between the the Nyströ approxiation and the full kernel atrix at each step of the algorith. All these three nor can be of interest to a downstrea achine learning practitioner (Gittens and Mahoney, 206). Again, we plot the results for one run of the algorith and for an average of 50 runs. Please see Figure 2. As seen in the plots, the Nyströ approxiation sees to provide a high degree of accuracy in approxiating the atrix K, even for a fairly sall nuber of basis points. 6 CONCLUSION We have in this paper presented an algorith for increental kernel PCA based on rank one updates to the eigendecoposition of the kernel atrix K or the eanadjusted kernel atrix K, which we extended to increental calculation of the Nyströ approxiation to the kernel atrix. Rank one update algoriths for the eigendecoposition other than the one chosen in this paper could also be applied to the kernel PCA proble, for potentially iproved accuracy and efficiency, including algoriths potentially not yet conceived. Furtherore, it could be straightforward to adapt the proposed algorith for increental kernel PCA to only aintain a subset of the eigenvectors and eigenvalues. An increental procedure for the Nyströ ethod can aid in deterining a suitable size of the subset used for the approxiation through epirical evaluation. A fairly liited aount of work has been dedicated to the deterination of this hyperparaeter or equivalent hyperparaeters for other approxiate kernel ethods. Various bounds on the statistical accuracy of the Nyströ ethod and related approxiations have been derived, which could guide the choice of this hyperparaeter, but this ight not be the ost suitable strategy. Acknowledgeents We would like to thank Ricardo Silva at the Departent of Statistical Science at UCL for helpful coents and guidance. References Bollobás, B. (999). Linear analysis. Cabridge University Press, Cabridge, UK, 2nd edition. Brand, M. (2006). Fast low-rank odifications of the thin singular value decoposition. Linear Algebra and its Applications, 45(): Bunch, J. R., Nielsen, C. P., and Sorensen, D. C. (978).

9 Rank-one odification of the syetric eigenproble. Nuerische Matheatik, 3():3 48. Chin, T.-J. and Suter, D. (2007). Increental kernel principal coponent analysis. IEEE Transactions on Iage Processing, 6(6): Dongarra, J. J. and Sorensen, D. C. (987). A fully parallel algorith for the syetric eigenvalue proble. SIAM Journal on Scientific and Statistical Coputing, 8(2): Gittens, A. and Mahoney, M. W. (206). Revisiting the Nyströ ethod for iproved large-scale achine learning. Journal of Machine Learning Research, 7(Dec): 65. Golub, G. H. (973). Soe odified atrix eigenvalue probles. Sia Review, 5(2): Golub, G. H. and Van Loan, C. F. (203). Matrix coputations. John Hopkins University Press, Baltiore, MD, 4th edition. Gu, M. and Eisenstat, S. C. (994). A stable and efficient algorith for the rank-one odification of the syetric eigenproble. SIAM Journal on Matrix Analysis and Applications, 5(4): Hoegaerts, L., De Lathauwer, L., Goethals, I., Suykens, J. A., Vandewalle, J., and De Moor, B. (2007). Efficiently updating and tracking the doinant kernel principal coponents. Neural Networks, 20(2): Hofann, T., Schölkopf, B., and Sola, A. J. (2008). Kernel ethods in achine learning. The Annals of Statistics, 36(3): Jolliffe, I. (2002). Principal coponent analysis. Springer, New York, NY, 2nd edition. Ki, K. I., Franz, M. O., and Schökopf, B. (2005). Iterative kernel principal coponent analysis for iage odeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9): Lichan, M. (203). UCI achine learning repository. Li, J., Ross, D. A., Lin, R.-S., and Yang, M.-H. (2004). Increental learning for visual tracking. In Advances in Neural Inforation Processing Systes, pages Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2(Feb): Mahoney, M. W. (20). Randoized algoriths for atrices and data. Foundations and Trends R in Machine Learning, 3(2): Mika, S., Rätsch, G., Weston, J., Schölkopf, B., and Müller, K.-R. (999). Fisher discriinant analysis with kernels. In Neural Networks for Signal Processing IX: Proceedings of the 999 IEEE Signal Processing Society Workshop, pages IEEE. Oja, E. (982). Siplified neuron odel as a principal coponent analyzer. Journal of Matheatical Biology, 5(3): Rudi, A., Caoriano, R., and Rosasco, L. (205). Less is ore: Nyströ coputational regularization. In Advances in Neural Inforation Processing Systes, pages Schölkopf, B., Herbrich, R., and Sola, A. (200). A generalized representer theore. In Coputational Learning Theory (COLT), pages Springer. Schölkopf, B., Sola, A., and Müller, K.-R. (998). Nonlinear coponent analysis as a kernel eigenvalue proble. Neural coputation, 0(5): Sheikholeslai, F., Berberidis, D., and Giannakis, G. B. (205). Kernel-based low-rank feature extraction on a budget for big data streas. In IEEE Global Conference on Signal and Inforation Processing (Global- SIP), pages IEEE. Sorensen, D. C. and Tang, P. T. P. (99). On the orthogonality of eigenvectors coputed by divide-andconquer techniques. SIAM Journal on Nuerical Analysis, 28(6): Tokuoto, T. and Ozawa, S. (20). A fast increental kernel principal coponent analysis for learning strea of data chunks. In International Joint Conference on Neural Networks (IJCNN), pages IEEE. Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., and Borgwardt, K. M. (200). Graph kernels. Journal of Machine Learning Research, (Apr): Willias, C. and Seeger, M. (200). Using the Nyströ ethod to speed up kernel achines. In Advances in Neural Inforation Processing Systes, pages Yang, T., Li, Y.-F., Mahdavi, M., Jin, R., and Zhou, Z.-H. (202). Nyströ ethod vs rando Fourier features: A theoretical and epirical coparison. In Advances in Neural Inforation Processing Systes, pages

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Finding Rightmost Eigenvalues of Large Sparse. Non-symmetric Parameterized Eigenvalue Problems. Abstract. Introduction

Finding Rightmost Eigenvalues of Large Sparse. Non-symmetric Parameterized Eigenvalue Problems. Abstract. Introduction Finding Rightost Eigenvalues of Large Sparse Non-syetric Paraeterized Eigenvalue Probles Applied Matheatics and Scientific Coputation Progra Departent of Matheatics University of Maryland, College Par,

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes Explicit solution of the polynoial least-squares approxiation proble on Chebyshev extrea nodes Alfredo Eisinberg, Giuseppe Fedele Dipartiento di Elettronica Inforatica e Sisteistica, Università degli Studi

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Fast and Memory Optimal Low-Rank Matrix Approximation

Fast and Memory Optimal Low-Rank Matrix Approximation Fast and Meory Optial Low-Rank Matrix Approxiation Yun Se-Young, Marc Lelarge, Alexandre Proutière To cite this version: Yun Se-Young, Marc Lelarge, Alexandre Proutière. Fast and Meory Optial Low-Rank

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Slide10. Haykin Chapter 8: Principal Components Analysis. Motivation. Principal Component Analysis: Variance Probe

Slide10. Haykin Chapter 8: Principal Components Analysis. Motivation. Principal Component Analysis: Variance Probe Slide10 Motivation Haykin Chapter 8: Principal Coponents Analysis 1.6 1.4 1.2 1 0.8 cloud.dat 0.6 CPSC 636-600 Instructor: Yoonsuck Choe Spring 2015 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 How can we

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS

RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS BIT Nuerical Matheatics 43: 459 466, 2003. 2003 Kluwer Acadeic Publishers. Printed in The Netherlands 459 RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS V. SIMONCINI Dipartiento di

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

A new type of lower bound for the largest eigenvalue of a symmetric matrix

A new type of lower bound for the largest eigenvalue of a symmetric matrix Linear Algebra and its Applications 47 7 9 9 www.elsevier.co/locate/laa A new type of lower bound for the largest eigenvalue of a syetric atrix Piet Van Mieghe Delft University of Technology, P.O. Box

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

An Improved Particle Filter with Applications in Ballistic Target Tracking

An Improved Particle Filter with Applications in Ballistic Target Tracking Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Synchronization in large directed networks of coupled phase oscillators

Synchronization in large directed networks of coupled phase oscillators CHAOS 16, 015107 2005 Synchronization in large directed networks of coupled phase oscillators Juan G. Restrepo a Institute for Research in Electronics and Applied Physics, University of Maryland, College

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Efficient Filter Banks And Interpolators

Efficient Filter Banks And Interpolators Efficient Filter Banks And Interpolators A. G. DEMPSTER AND N. P. MURPHY Departent of Electronic Systes University of Westinster 115 New Cavendish St, London W1M 8JS United Kingdo Abstract: - Graphical

More information

Collaborative Filtering using Associative Neural Memory

Collaborative Filtering using Associative Neural Memory Collaborative Filtering using Associative Neural Meory Chuck P. La Electrical Engineering Departent Stanford University Stanford, CA chuckla@stanford.edu Abstract There are two types of collaborative filtering

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

Neural Network Learning as an Inverse Problem

Neural Network Learning as an Inverse Problem Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic. Eail: vera@cs.cas.cz

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Effective joint probabilistic data association using maximum a posteriori estimates of target states

Effective joint probabilistic data association using maximum a posteriori estimates of target states Effective joint probabilistic data association using axiu a posteriori estiates of target states 1 Viji Paul Panakkal, 2 Rajbabu Velurugan 1 Central Research Laboratory, Bharat Electronics Ltd., Bangalore,

More information

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK eospatial Science INNER CONSRAINS FOR A 3-D SURVEY NEWORK hese notes follow closely the developent of inner constraint equations by Dr Willie an, Departent of Building, School of Design and Environent,

More information

Using a De-Convolution Window for Operating Modal Analysis

Using a De-Convolution Window for Operating Modal Analysis Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Solving initial value problems by residual power series method

Solving initial value problems by residual power series method Theoretical Matheatics & Applications, vol.3, no.1, 13, 199-1 ISSN: 179-9687 (print), 179-979 (online) Scienpress Ltd, 13 Solving initial value probles by residual power series ethod Mohaed H. Al-Sadi

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

(t, m, s)-nets and Maximized Minimum Distance, Part II

(t, m, s)-nets and Maximized Minimum Distance, Part II (t,, s)-nets and Maxiized Miniu Distance, Part II Leonhard Grünschloß and Alexander Keller Abstract The quality paraeter t of (t,, s)-nets controls extensive stratification properties of the generated

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE Proceedings of ICIPE rd International Conference on Inverse Probles in Engineering: Theory and Practice June -8, 999, Port Ludlow, Washington, USA : RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange. CHAPTER 3 Data Description Objectives Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance,

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Lecture 13 Eigenvalue Problems

Lecture 13 Eigenvalue Problems Lecture 13 Eigenvalue Probles MIT 18.335J / 6.337J Introduction to Nuerical Methods Per-Olof Persson October 24, 2006 1 The Eigenvalue Decoposition Eigenvalue proble for atrix A: Ax = λx with eigenvalues

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili, Australian Journal of Basic and Applied Sciences, 5(3): 35-358, 20 ISSN 99-878 Generalized AOR Method for Solving Syste of Linear Equations Davod Khojasteh Salkuyeh Departent of Matheatics, University

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information