Kernel Matching Pursuit

Size: px
Start display at page:

Download "Kernel Matching Pursuit"

Transcription

1 Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada Technica Report #1179 Département d Informatique et Recherche Opérationnee Université demontréa August 28th, 2000 Abstract Matching Pursuit agorithms earn a function that is a weighted sum of basis functions, by sequentiay appending functions to an initiay empty basis, to approximate a target function in the eastsquares sense. We show how matching pursuit can be extended to use non-squared error oss functions, and how it can be used to buid kerne-based soutions to machine-earning probems, whie keeping contro of the sparsity of the soution. We aso derive MDL motivated generaization bounds for this type of agorithm, and compare them to reated SVM (Support Vector Machine) bounds. Finay, inks to boosting agorithms and RBF training procedures, as we as an extensive experimenta comparison with SVMs for cassification are given, showing comparabe resuts with typicay sparser modes. 1 Introduction Recenty, there has been a renewed interest for kerne-based methods, due in great part to the success of the Support Vector Machine approach (Boser, Guyon and Vapnik, 1992; Vapnik, 1995). Kerne-based earning agorithms represent the function f(x) to be earnt with a inear combination of terms of the form K(x, x i ), where x i is generay the input vector associated to one of the training exampes, and K is a symmetric positive definite kerne function. Support Vector Machines (SVMs) are kerne-based earning agorithms in which ony a fraction of the training exampes are used in the soution (these are caed the Support Vectors), and where the objective of earning is to maximize a margin around the decision surface (in the case of cassification). Matching Pursuit was originay introduced in the signa-processing community as an agorithm that decomposes any signa into a inear expansion of waveforms that are seected from a redundant dictionary of functions. (Maat and Zhang, 1993).

2 It is a genera, greedy, sparse function approximation scheme with the squared error oss, which iterativey adds new functions (i.e. basis functions) to the inear expansion. If we take as dictionary of functions the functions d i (x) of the form K(x, x i )wherex i is the input part of a training exampe, then the inear expansion has essentiay the same form as a Support Vector Machine. Matching Pursuit and its variants were deveoped primariy in the signa-processing and waveets community, but there are many interesting inks with the research on kerne-based earning agorithms deveoped in the machine-earning community. Connections between a reated agorithm (basis pursuit (Chen, 1995)) and SVMs had aready been reported in (Poggio and Girosi, 1998). More recenty, (Smoa and Schökopf, 2000) shows connections between Matching Pursuit, Kerne-PCA, Sparse Kerne Feature anaysis, and how this kind of greedy agorithm can be used to compress the design-matrix in SVMs to aow handing of huge data-sets. Sparsity of representation is an important issue, both for the computationa efficiency of the resuting representation, and for its theoretica and practica infuence on generaization performance (see (Graepe, Herbrich and Shawe-Tayor, 2000) and (Foyd and Warmuth, 1995)). However the sparsity of the soutions found by the SVM agorithm is hardy controabe, and often these soutions are not very sparse. Our research started as a search for a fexibe aternative framework that woud aow us to directy contro the sparsity (in terms of number of support vectors) of the soution and remove the requirements of positive definiteness of K (and the representation of K as a dot product in a high-dimensiona feature space ). It ead us to uncover connections between greedy Matching Pursuit agorithms, Radia Basis Function training procedures, and boosting agorithms (section 4). We wi discuss these together with a description of the proposed agorithm and extensions thereof to use margin oss functions. We first (section 2) give an overview of the Matching Pursuit famiy of agorithms (the basic version and two refinements thereof), as a genera framework, taking a machine-earning viewpoint. We aso give a detaied description of our particuar impementation that yieds a choice of the next basis function to add to the expansion by minimizing simutaneousy across the expansion weights and the choice of the basis function, in a computationay efficient manner. We then show (section 3) how this framework can be extended, to aow the use of other differentiabe oss functions than the squared error to which the origina agorithms are imited. This might be more appropriate for some cassification probems (athough, in our experiments, we have used the squared oss for many cassification probems, aways with successfu resuts). This is foowed by a discussion about margin oss functions, underining their simiarity with more traditiona oss functions that are commony used for neura networks. In section 4 we expain how the matching pursuit famiy of agorithms can be used to buid kerne-based soutions to machine-earning probems, and how this reates to other machine-earning agorithms, namey SVMs, boosting agorithms, and Radia Basis Function training procedures. In section 5, we use previous theoretica work on the minimum description ength principe to construct generaization error bounds for the proposed agorithm. Basicay, the generaization error is bounded by the training error pus terms that grow with the fraction of support vectors. These bounds are compared with bounds obtained for Support Vector Machines. Finay, in section 6, we provide an experimenta comparison between SVMs and different variants of Matching Pursuit, performed on artificia data, USPS digits

3 cassification, and UCI machine-earning databases benchmarks. The main experimenta resut is that Kerne Matching Pursuit agorithms can yied generaization performance as good as Support Vector Machines, but often using significanty fewer support vectors. 2 Three favors of Matching Pursuit In this section we first describe the basic Matching Pursuit agorithm, as it was introduced by (Maat and Zhang, 1993), but from a machine-earning perspective rather than a signa processing one. We then present two successive refinements of the basic agorithm. 2.1 Basic Matching Pursuit We are given noisy observations {y 1,...,y } of a target function f Hat points {x 1,...,x }. We are aso given a finite dictionary D = {d 1,...,d m } of functions in a Hibert space H, and we are interested in sparse approximations of f that are expansions of the form N ˆf N = α n g n (1) n=1 where (α 1,...,α N ) IR N and {g 1,...,g N } Darechosen to minimize the squared norm of the residue R N 2 = f ˆf N 2. We sha ca the set {g 1,...,g N } our basis, andn the number of basis functions in the expansion. Notice that, in a typica machine-earning framework, a we have are noisy observations of the target function f at the data points x 1... So we sometimes abuse the notation, using f to actuay mean (y 1,...,y ). Aso, throughout this artice, for a practica purposes, during training, any function in H can be associated to an dimensiona vector that represents the function evauated at the x 1.. data points. We wi make extensive use of this abuse of notation for convenience; in particuar the notation g, h wi be used to represent the dot product between the two dimensiona vectors associated with functions g and h, and h is used to represent the L 2 norm of the vector associated to a function h. Ony when using the earnt approximation on new test data do we use the dictionary functions as actua functions. Now, finding the optima basis {g 1,...,g N } for a given number N of aowed basis functions is in genera an NP-compete probem. So the matching pursuit agorithm proceeds in a greedy constructive, fashion: It starts at stage 0 with ˆf 0 = 0, and recursivey appends functions to an initiay empty basis, at each stage n, trying to reduce the norm of the residue R n = ˆf n f. Given ˆf n we buid ˆf n+1 = ˆf n + α n+1 g n+1 by searching for g n+1 Dand for α n+1 IR that minimize the squared norm of the residue, R n+1 2 = R n α n+1 g n+1 2, i.e. ( n ) (g n+1,α n+1 ) = arg min α k g k +αg f 2 (2) (g D,α IR) k=1 }{{} ˆf n

4 INPUT: data set {(x 1,y 1 ),...,(x,y )} dictionary of functions D = {d 1,...,d m } number N of basis functions desired in the expansion (or, aternativey, a vaidation set to decide when to stop) INITIALIZE: residue vector R and dictionary matrix D y 1 d 1 (x 1 ) d m (x 1 ) R. and D..... y d 1 (x ) d m (x ) FOR n =1..N (or unti performance on vaidation set stops improving): γ n arg max D(., k),r k=1..m D(., k) α n D(., γ n),r D(., γ n ) 2 R R α n D(., γ n ) RESULT: The soution found is defined by ˆf N (x) = N α n d γn (x) n=1 Figure 1: Basic Matching Pursuit Agorithm The g n+1 that minimizes this expression is the one that maximizes g n+1,r n g n+1 and the corresponding α n+1 is α n+1 = g n+1,r n g n+1 2 We have not yet specified how to choose N (i.e. when to stop). In the signa processing iterature the agorithm is usuay stopped when the reconstruction error ( R 2 ) goes beow a predefined given threshod. For machine-earning probems, we sha rather use the error estimated on an independent vaidation set 1 to decide when to stop. In any case, N can be seen as the primary capacity-contro parameter of the agorithm. In section 5, we show that the generaization error of matching pursuit agorithms can be directy inked to the ratio N ( is the number of training exampes). The pseudo-code for the corresponding agorithm is given in figure 1 (there are sight differences in the notation, in particuar g n in the above expanations corresponds to vector D(., γ n ) in the more detaied pseudo-code). 2.2 Matching Pursuit with backfitting In the basic version of the agorithm, not ony is the set of basis functions g 1..n obtained at every step n suboptima, but so are aso their α 1..n coefficients. This can 1 or a more computationay intensive cross-vaidation technique if the data is scarce.

5 be corrected in a step often caed back-fitting or back-projection and the resuting agorithm is known as Orthogona Matching Pursuit (OMP) (Pati, Rezaiifar and Krishnaprasad, 1993; Davis, Maat and Zhang, 1994): Whie sti choosing g n+1 as previousy (equation 2), we recompute the optima set of coefficients α 1..n+1 at each step instead of ony the ast α n+1 : α (n+1) 1..n+1 = arg min (α 1..n+1 IR n+1 ) ( n+1 ) α k g k f Note that this is just ike a inear regression with parameters α 1..n+1. This backprojection step aso has a geometrica interpretation: Let B n the sub-space of H spanned by the basis (g 1,...,g n )andetbn = H B n be its orthogona compement. Let P Bn and P B n denote the projection operators on these subspaces. Then, any g Hcan be decomposed as g = P Bn g + P B n g (see figure 2). Ideay, we want the residue R n to be as sma as possibe, so given the basis at step n, wewant ˆf n = P Bn f and R n = P B n f. This is what (3) insures. But whenever we append the next α n+1 g n+1 found by (2) to the expansion, we actuay add its two orthogona components: k=1 P B n α n+1 g n+1 contributes to reducing the norm of the residue. P Bn α n+1 g n+1 which increases the norm of the residue. However, as the atter part beongs to P Bn it can be compensated for by adjusting the previous coefficients of the expansion: this is what the back-projection does. 2 (3) B n g y fn P g B n Rn P g B n B n Figure 2: Geometrica interpretation of Matching Pursuit and backprojection (Davis, Maat and Zhang, 1994) suggest maintaining an additiona orthogona basis of the B n space to faciitate this back-projection, which resuts in a computationay efficient agorithm 2. 2 In our impementation, we used a sighty modified version of this approach, described in the prefitting agorithm beow.

6 2.3 Matching Pursuit with prefitting With backfitting, the choice of the function to append at each step is made regardess of the ater possibiity to update a weights: as we find g n+1 using (2) and ony then optimize (3), we might be picking a dictionary function other than the one that woud give the best fit. Instead, it is possibe to directy optimize ( ) g n+1,α (n+1) 1..n+1 = arg min (g D,α 1..n+1 IR n+1 ) ( n ) α k g k + α n+1 g f k=1 2 (4) We sha ca this procedure prefitting to distinguish it from the former backfitting (as backfitting is done ony after the choice of g n+1 ). This can be achieved amost as efficienty as backfitting. Our impementation maintains a representation of both the target and a dictionary vectors as a decomposition into their projections on B n and Bn : As before, et B n = span(g 1,...,g n ). We maintain at each step a representation of each dictionary vector d as the sum of two orthogona components: component d Bn = P Bn d ies in the space B n spanned by the current basis and is expressed as a inear combination of current basis vectors (it s a n dimensiona vector). component d B n = P B n d ies in B n s orthogona compement and is expressed in the origina -dimensiona vector space coordinates. We aso maintain the same representation for the target y, namey its decomposition into the current expansion ˆf n B n pus the orthogona residue R n Bn. Prefitting is then achieved easiy by considering ony the components in Bn : we choose g n+1 as the g Dwhose g B n is most coinear with R n Bn. This procedure requires, at every step, ony two passes through the dictionary (searching g n+1,then updating the representation) where basic matching pursuit requires one. The detaied pseudo-code for this agorithm is given in figure Summary of the three variations of MP Regardess of the computationa tricks that use orthogonaity properties for efficient computation, the three versions of matching pursuit differ ony in the way the next function to append to the basis is chosen and the α coefficients are updated at each step n: Basic version: We find the optima g n to append to the basis and its optima α n, whie keeping a other coefficients fixed (equation 2). backfitting version: We find the optima g n whie keeping a coefficients fixed (equation 2). Then we find the optima set of coefficients α (n) 1..n for the new basis (equation 3). prefitting version: We find at the same time the optima g n and the optima set of coefficients α (n) 1..n (equation 4).

7 INPUT: data set {(x 1,y 1 ),...,(x,y )} dictionary of functions D = {d 1,...,d m } number N of basis functions desired in the expansion (or, aternativey, a vaidation set to decide when to stop) INITIALIZE: residue vector R and dictionary matrix component D B and D B R y 1.. y and D B d 1 (x 1 ) d m (x 1 ) d 1 (x ) d m (x ) D B is initiay empty, and gets appended an additiona row at each step (thus, ignore the expressions that invove D B during the first iteration when n =1) FOR n =1..N (or unti performance on vaidation set stops improving): γ n arg max D B (., k),r k=1..m D B (., k) α n D B (., γ n),r D B (., γ n ) 2 the B component of α n d γn reduces the residue: R R α n D B (., γ n ) compensate for the B component of α n d γn by adjusting previous α: (α 1,...,α n 1 ) (α 1,...,α n 1 ) α n D B (., γ n ) Now update the dictionary representation to take into account the new basis function d γn... FOR i =1..m AND i γ n : β i D B (., γ n),d B (., i) D B (., γ n ) 2 D B (., i) D B (., i) β i D B (., γ n ) D B (., i) D B (., i) β i D B (., γ n ) D B (., γ n ) 0 D B (., γ n ) 0 β γn 1 ( ) D B D B β 1,...,β m RESULT: The soution found is defined by ˆf N N (x) = α n d γn (x) n=1 Figure 3: Matching Pursuit with prefitting

8 When making use of orthogonaity properties for efficient impementations of the backfitting and prefitting version (as in our previousy described impementation of the prefitting agorithm), a three agorithms have a computationa compexity of the same order O(N.m.). 3 Extension to non-squared error oss 3.1 Gradient descent in function space It has aready been noticed that boosting agorithms are performing a form of gradient descent in function space with respect to particuar oss functions (Schapire et a., 1998; Mason et a., 2000). Foowing (Friedman, 1999), the technique can be adapted to extend the Matching Pursuit famiy of agorithms to optimize arbitrary differentiabe oss functions, instead of doing east-squares fitting. Given a oss function L(y i, ˆf n (x i )) that computes the cost of predicting a vaue of ˆf n (x i ) when the true target was y i, we use an aternative residue R n rather than the usua R n = y ˆf n when searching for the next dictionary eement to append to the basis at each step. R n is the direction of steepest descent (the gradient) in function space (evauated at the data points) with respect to L: L R n = ˆf n (x 1 ) ( y 1, ˆf n (x 1 ) ) ( L y, ˆf ) n (x ),..., ˆf (5) n (x ) i.e. g n+1 is chosen such that it is most coinear with this gradient: g n+1, R n g n+1 = arg max g D g n+1 (6) A ine-minimization procedure can then be used to find the corresponding coefficient α n+1 = arg min α IR i=1 ( L f(x i ), ˆf ) n (x i )+αg n+1 (x i ) This woud correspond to basic matching pursuit (notice how the origina squarederror agorithm is recovered when L is the squared error: L(a, b) =(a b) 2 ). It is aso possibe to do backfitting, by re-optimizing a α 1..n+1 (instead of ony α n+1 ) to minimize the target cost (with a conjugate gradient optimizer for instance): ( α (n+1) 1..n+1 = arg min L (α 1..n+1 IR n+1 ) i=1 ) n+1 f(x i ), α k g k But as this can be quite time-consuming (as we cannot use any orthogonaity property in this genera case), it may be desirabe to do it every few steps instead of every singe step. The corresponding agorithm is described in more detais in the pseudo-code of figure 4 (as previousy there are sight differences in the notation, in particuar g k in the above expanation corresponds to vector D(., γ k ) in the more detaied pseudo-code). k=1 (7) (8)

9 Finay, et s mention that it shoud in theory aso be possibe to do prefitting with an arbitrary oss functions, but finding the optima {g k+1 D,α 1..k+1 IR k+1 } in the genera case (when we cannot use any orthogona decomposition) woud invove soving equation 8 in turn for each dictionary function in order to choose the next one to append to the basis, which is computationay prohibitive. 3.2 Margin oss functions versus traditiona oss functions for cassification Now that we have seen how the matching pursuit famiy of agorithms can be extended to use arbitrary oss functions, et us discuss the merits of various oss functions. In particuar the reationship between oss functions and the notion of margin is of primary interest here, as we wanted to buid an aternative to SVMs 3. Whie the origina notion of margin in cassification probems comes from the geometricay inspired hard-margin of inear SVMs (the smaest Eucidean distance between the decision surface and the training points), a sighty different perspective has emerged in the boosting community aong with the notion of margin oss function. The margin quantity m = y ˆf(x) of an individua data point (x, y), with y { 1, +1} can be understood as a confidence measure of its cassification by function ˆf, whie the cass decided for is given by sign( ˆf(x)). A margin oss function is simpy a function of this margin quantity m that is being optimized. It is possibe to formuate SVM training such as to show the SVM margin oss function: Let ϕ be the mapping into the feature-space of SVMs, such that <ϕ(x i ),ϕ(x j ) >= K(x i,x j ) The SVM soution can be expressed in this feature space as ˆf(x) =<w,ϕ(x) > +b where w = α i y i ϕ(x i ) x i SV Where SV is the set of support vectors and the soution is the one that minimizes [1 y i ˆf(xi )] C w 2 (9) i=1 Where C is the box-constraint parameter of SVMs, and the notation [x] + is to be understood as the function that gives [x] + = x when x>0and0otherwise. Let m = y ˆf(x) theindividua margin at point x. (9) is ceary the sum of a margin oss function and a reguarization term. It is interesting to compare this margin oss function to those used in boosting agorithms and to the more traditiona cost functions. The oss functions that boosting agorithms optimize are typicay expressed as functions of m. Thus AdaBoost (Schapire et a., 1998) uses an exponentia (e m ) margin oss function, LogitBoost (Friedman, Hastie and Tibshirani, 1998) uses the negative binomia ogikeihood, og 2 (1 + e 2m ), whose shape is simiar to a smoothed version of the 3 whose good generaization abiities are beieved to be due to margin-maximization.

10 INPUT: data set {(x 1,y 1 ),...,(x,y )} dictionary of functions D = {d 1,...,d m } number N of basis functions desired in the expansion (or, aternativey, a vaidation set to decide when to stop) how often to do a fu backfitting: every p update steps a oss function L INITIALIZE: current approximation ˆf and dictionary matrix D ˆf 0 d 1 (x 1 ) d m (x 1 ) ˆf =. 0 and D..... ˆf d 1 (x ) d m (x ) FOR n =1..N (or unti performance on vaidation set stops improving): L(y1, ˆf 1) R ˆf 1.. L(y, ˆf n) ˆf D(., k), R γ n arg max k=1..m D(., k) If n is not a mutipe of p do a simpe ine minimization: RESULT: α n arg min α IR L(y i, ˆf i + αd(i, γ n )) i=1 and update ˆf: ˆf ˆf + αn D(., γ n ) If n is a mutipe of p do a fu backfitting (for ex. with gradient descent): and recompute ˆf α 1..n arg min α 1..n IR n n L(y i, α k D(i, γ k )) i=1 n α k D(., γ k ) k=1 The soution found is defined by ˆf N (x) = k=1 N α n d γn (x) Figure 4: Backfitting Matching Pursuit Agorithm with non-squared oss n=1

11 soft-margin SVM oss function [1 m] +, and Doom II (Mason et a., 2000) approximates a theoreticay motivated margin oss with 1 tanh(m). As can be seen in Figure 5 (eft), a these functions encourage arge positive margins, and differ mainy in how they penaize arge negative ones. In particuar 1 tanh(x) is expected to be more robust, as it won t penaize outiers to excess. It is enightening to compare these with the more traditiona oss functions that have been used for neura networks in cassification tasks (i.e. y { 1, +1}), when we express them as functions of m. Squared oss: ( ˆf(x) y) 2 =(1 m) 2 Squared oss after tanh with modified target: (tanh( ˆf(x)) 0.65y) 2 =(0.65 tanh(m)) 2 Both are iustrated on figure 5 (right). Notice how the squared oss after tanh appears simiar to the margin oss function used in Doom II, except that it sighty increases for arge positive margins, which is why it behaves we and does not saturate even with unconstrained weights (boosting and SVM agorithms impose constraints on the weights, here denoted α s) exp(-m) [AdaBoost] og(1+exp(-m)) [LogitBoost] 1-tanh(m) [Doom II] (1-m)+ [SVM] squared error as a margin cost function squared error after tanh with 0.65 target 2 2 oss(m) 1.5 oss(m) margin m = y.f(x) margin m = y.f(x) Figure 5: Boosting and SVM margin oss functions (eft) vs. traditiona oss functions (right) viewed as functions of the margin. Interestingy the ast-born of the margin motivated oss functions (used in Doom II) is simiar to the traditiona squared error after tanh. 4 Kerne Matching Pursuit and inks with other paradigms 4.1 Matching pursuit with a kerne-based dictionary Kerne Matching Pursuit (KMP) is simpy the idea of appying the Matching Pursuit famiy of agorithms to probems in machine-earning, using a kerne-based dictionary: Given a kerne function K : IR d IR d IR, we use as our dictionary the kerne centered on the training points: D = {d i = K(,x i ) i =1..}. Optionay, the constant function can aso be incuded in the dictionary, which accounts for a bias term b: the functiona form of approximation ˆf N then becomes ˆf N (x) =b + N α n K(x, x γn ) (10) n=1

12 where the γ 1..N are the indexes of the support points. During training we ony consider the vaues of the dictionary functions at the training points, so that it amounts to doing Matching in a vector-space of dimension. When using a squared error oss 4, the compexity of a three variations of KMP (basic, backfitting and prefitting) is O(N.m.) =O(N. 2 ) if we use a the training data as candidate support points. But it is aso possibe to use a random subset of the training points as support candidates (which yieds a m<). We woud aso ike to emphasize the fact that the use of a dictionary gives a ot of additiona fexibiity to this framework, as it is possibe to incude any kind of function into it, in particuar: There is no restriction on the shape of the kerne (no positive-definiteness constraint, coud be assymetrica, etc...). The dictionary coud incude more than a singe fixed kerne shape: it coud mix different kerne types to choose from at each point, aowing for instance the agorithm to choose among severa widths of a Gaussian for each support point. Simiary, the dictionary coud easiy be used to constrain the agorithm to use a kerne shape specific to each cass, based on prior-knowedge. The dictionary can incorporate non-kerne based functions (we aready mentioned the constant function to recover the bias term b, but this coud aso be used to incorporate prior knowedge). For huge data-sets, a reduced subset can be used as the dictionary to speed up the training. However in this study, we restrict ourseves to using a singe fixed kerne, so that the resuting functiona form is the same as the one obtained with SVMs. 4.2 Simiarities and differences with SVMs The functiona form (10) is very simiar to the one obtained with the Support Vector Machine (SVM) agorithm (Boser, Guyon and Vapnik, 1992), the main difference being that SVMs impose further constraints on α 1..N. However the quantity optimized by the SVM agorithm is quite different from the KMP greedy optimization, especiay when using a squared error oss. Consequenty the support vectors and coefficients found by the two types of agorithms are usuay different (see our experimenta resuts in section 6). Another important difference, and one that was a motivation for this research, is that in KMP, capacity contro is achieved by directy controing the sparsity of the soution, i.e. the number N of support vectors, whereas the capacity of SVMs is controed through the box-constraint parameter C, which has an indirect and hardy controabe infuence on sparsity. See (Graepe, Herbrich and Shawe-Tayor, 2000) for a discussion on the merits of sparsity and margin, and ways to combine them. 4 The agorithms generaized to arbitrary oss functions can be much more computationay intensive, as they impy a non-inear optimization step.

13 4.3 Link with Radia Basis Functions Squared-error KMP with a Gaussian kerne and prefitting appears to be identica to a particuar Radia Basis Functions training agorithm caed Orthogona Least Squares RBF (Chen, Cowan and Grant, 1991) (OLS-RBF). In (Schökopf et a., 1997) SVMs were compared to cassica RBFs, where the RBF centers were chosen by unsupervised k-means custering, and SVMs gave better resuts. To our knowedge, however, there has been no experimenta comparison between OLS-RBF and SVMs, athough their resuting functiona forms are very much aike. Such an empirica comparison is one of the contributions of this paper. Basicay our resuts (section 6) show OLS-RBF (i.e. squared-error KMP) to perform as we as Gaussian SVMs, whie aowing a tighter contro of the number of support vectors used in the soution. 4.4 Boosting with kernes KMP in its basic form generaized to using non-squared error is aso very simiar to boosting agorithms (Freund and Schapire, 1996; Friedman, Hastie and Tibshirani, 1998), in which the chosen cass of weak-earners woud be the set of kernes centered on the training points. These agorithms differ mainy in the oss function they optimize, which we have aready discussed in section Bounds on generaization error The resuts of Vapnik on the Minimum Description Length (Vapnik, 1995; Vapnik, 1998) provide a possibe framework for estabishing bounds on expected generaization error for KMP agorithms. One can aso simpy use the resuts on the generaization error obtained when the number of possibe functions is a finite number M, (and the capacity is therefore bounded by og M). We wi show that, essentiay, the bound depends ineary on the number of support vectors and ogarithmicay on the tota number of training exampes. Vapnik s resut (theorem 4.3, (Vapnik, 1995)) states that the expected generaization error rate, E gen, for binary cassification, when training with exampes, is ess than 2C og(2) 2og(η)/ with probabiity greater than 1 η, wherec is the compression rate: the number of bits to transfer the compressed conditiona vaue of the training target casses (given the training input points) divided by the number of bits required to transmit them without compression, i.e.,. When there are training errors, we can incorporate them into the compressed message by sending the identity (and the abes, in the muticass case) of the wrongy abeed exampes. The compression is due to the representation earned by the training agorithm. A good representation is one that requires few bits to represent the earned function, whie keeping the training error ow. This assumes that the number of possibe functions is finite (which we wi obtain by quantizing the α coefficients). To obtain compression, we take advantage of the sparse representation of the earned function in terms of ony N support points. To obtain a rough bound we wi encode the target outputs using three sets of bits, corresponding to three terms for C: 1. The first one is due to the cassification errors: we have to send the identity and the correct cass of the training ( errors. If the number of errors is e = E emp, that wi cost og e ) 2 bits. In the case where the number

14 of casses is N c > 2, there is an increase in the number of bits by a factor og 2 (N c 1), but there is a simiar increase in the numerator of C (to encode the correct casses of a the training exampes). 2. The second term is required to encode the identity ( of the support points: to choose N among exampes requires og N ) 2 bits. 3. The third term is to encode the quantized weights α k associated with each support point, which wi cost Np bits, where p is the number of bits of precision to quantize the weights, and it can be chosen as the smaest number that aows to obtain with the discretized α sthesamecasseson the training set as the undiscretized α s. To summarize, for KMP, we have, for e training errors and N support vectors out of exampes, with probabiity greater than 1 η (over the choice of training set), E gen < 2 og ( ) ( e +og N ) +(Npog 2) 2og(η)/ (11) Note that ( ) n is poory bounded by n og2,inwhichthee/ and N/ ratios become apparent, but where a too arge og factor appears. Sighty tighter bounds can be obtained using the resut (Vapnik, 1995; Vapnik, 1998) for earning by choosing one function among M< functions: with probabiity at east 1 η, E gen E emp + og M og η ( ) 2E emp. (12) og M og η Using the same quantization of the α s (with precision p), one obtains with og M = og( ( ) N 2 Np ), E gen <E emp + og ( ) ( ) N + Npog 2 og η 2E emp og ( ). N + Npog 2 og η (13) In contrast, one can obtain an expectation bound (Vapnik, 1995) for SVMs that is E[E gen ] <E[E emp ]+E[ N ], where E is the expectation over training sets (note that for SVMs, N is random because it depends on the training set). Note that the probabiity bounds can be readiy converted into expectation bounds. For exampe, in the case of the MDL bound (eq. 11), one obtains that in expectation, E gen < 2 E[og ( ) ( e ]+og N ) +(Npog 2) + 1. To see the roe of the ratio N intheabove,onecannotethat og( N) < N og (but keep in mind that this is a rather poor bound). Note that severa reated compression bounds have been studied, e.g. (Littestone and Warmuth, 1986; Foyd and Warmuth, 1995; Graepe, Herbrich and Shawe- Tayor, 2000). The resuts of (Graepe, Herbrich and Shawe-Tayor, 2000) are meant for maximum margin cassifiers and draw interesting connections between sparsity and maximum margin. The resuts in (Littestone and Warmuth, 1986; Foyd and Warmuth, 1995) are very genera (and very much inked to the above discussion), but they appy to cassifiers which can be specified using ony a subset of the training exampes. However, note that the case of Matching Pursuit, the cassifier requires not ony the support vectors but aso the weights α i, which in genera depend on the whoe training set.

15 6 Experimenta resuts on binary cassification Throughout this section: any mention of KMP without further specification of the oss function means east-squares KMP (aso sometimes written KMP-mse) KMP-tanh refers to KMP using squared error after a hyperboic tangent with modified targets (which behaves more ike a typica unessmargin oss function as we discussed earier in section 3.2). Uness otherwise specified, we used the prefitting matching pursuit agorithm of figure 3 to train east-squares KMP. To train KMP-tanh we aways used the backfitting matching pursuit with non-squared oss agorithm of figure 4 with a conjugate gradient optimizer to optimize the α 5 1..n D experiments Figure 6 shows a simpe 2D binary cassification probem with the decision surface found by the three versions of squared-error KMP (basic, backfitting and prefitting) and a hard-margin SVM, when using the same Gaussian kerne. We fixed the number N of support points for the prefitting and backfitting versions to be the same as the number of support points found by the SVM agorithm. The aim of this experiment was to iustrate the foowing points: Basic KMP, after 100 iterations, during which it mosty cyced back to previousy chosen support points to improve their weights, is sti unabe separate the data points. This shows that the backfitting and prefitting versions are a usefu improvement, whie the basic agorithm appears to be a bad choice if we want sparse soutions. The backfitting and prefitting KMP agorithms are abe to find a reasonabe soution (the soution found by prefitting ooks sighty better in terms of margin), but choose different support vectors than SVM, that are not necessariy cose to the decision surface (as they are in SVMs). It shoud be noted that the Reevance Vector Machine (Tipping, 2000) simiary produces 6 soutions in which the reevance vectors do not ie cose to the border. Figure 7, where we used a simpe dot-product kerne (i.e. inear decision surfaces), iustrates a probem that can arise when using east-squares fit: since the squared error penaizes arge positive margins, the decision surface is drawn towards the custer on the ower right, at the expense of a few miscassified points. As expected, the use of a tanh oss function appears to correct this probem. 5 We tried severa frequencies at which to do fu-backfitting, but it did not seem to have a rea impact, as ong as it was done often enough. 6 however in a much more computationay intensive fashion.

16 Figure 6: From eft to right: 100 iterations of basic KMP, 7 iterations of KMP backfitting, 7 iterations of KMP prefitting, and SVM. Casses are + and. Support vectors are circed. Prefitting KMP and SVM appear to find equay reasonabe soutions, though using different support vectors. Ony SVM chooses its support vectors cose to the decision surface. backfitting chooses yet another support set, and its decision surface appears to have a sighty worse margin. As for basic KMP, after 100 iterations during which it mosty cyced back to previousy chosen support points to improve their weights, it appears to use more support vectors than the others whie sti being unabe to separate the data points, and is thus a bad choice if we want sparse soutions. Figure 7: Probem with east squares fit that eads KMP-mse (center) to miscassify points, but does not affect SVMs (eft), and is successfuy treated by KMP-tanh (right). 6.2 US Posta Service Database The main purpose of this experiment was to compement the resuts of (Schökopf et a., 1997) with those obtained using KMP-mse, which, as aready mentioned, is equivaent to orthogona east squares RBF (Chen, Cowan and Grant, 1991). In (Schökopf et a., 1997) the RBF centers were chosen by unsupervised k-means custering, in what they referred to as Cassica RBF, and a gradient descent optimization procedure was used to train the kerne weights. We repeated the experiment using KMP-mse (equivaent to OLS-RBF) to find the support centers, with the same Gaussian Kerne and the same training set (7300 patterns) and independent test set (2007 patterns) of preprocessed handwritten digits. Tabe 1 gives the number of errors obtained by the various agorithms on the tasks consisting of discriminating each digit versus a the others (see (Schökopf et a., 1997) for more detais). No vaidation data was used to choose the number of bases (support vectors) for the KMP. Instead, we trained with N equa to the number of support vectors obtained with the SVM, and aso with N equa to haf that number, to see whether a sparser KMP mode woud sti yied good resuts. As can be seen, resuts obtained with KMP are comparabe to those obtained for SVMs, contrariy to the resuts obtained with k-means RBFs, and there is ony a

17 sight oss of performance when using as few as haf the number of support vectors. Tabe 1: USPS Resuts: number of errors on the test set (2007 patterns), when using the same number of support vectors as found by SVM (except ast row which uses haf #sv). Squared error KMP (same as OLS-RBF) appears to perform as we as SVM. Digit cass #sv SVM k-means RBF KMP (same #sv) KMP (haf #sv) Benchmark datasets We did some further experiments, on 5 we-known datasets from the the UCI machine-earning databases, using Gaussian kernes of the form K(x 1,x 2 )=e x 1 x 2 2 σ 2. A first series of experiments used the machinery of the Deve (Rasmussen et a., 1996) system to assess performance on the Mushrooms dataset. Hyper-parameters (the σ of the kerne, the box-constraint parameter C for soft-margin SVM and the number of support points for KMP) were chosen automaticay for each run using 10-fod cross-vaidation. The resuts for varying sizes of the training set are summarized in tabe 2. The p- vaues reported in the tabe are those computed automaticay by the Deve system 7. Tabe 2: Resuts obtained on the mushrooms data set with the Deve system. KMP requires ess support vectors, whie none of the differences in error rates are significant. size of KMP SVM p-vaue KMP SVM train error error (t-test) #s.v. #s.v % 4.54% % 2.61% % 1.14% % 0.30% % 0.07% For each size, the deve system did its estimations based on 8 disjoint training sets of the given size and 8 disjoint test sets of size 503, except for 1024, in which case it used 4 disjoint training sets of size 1024 and 4 test sets of size 1007.

18 For Wisconsin Breast Cancer, Sonar, Pima Indians Diabetes and Ionosphere, we used a sighty different procedure. The σ of the Kerne was first fixed to a reasonabe vaue for the given data set 8. Then we used the foowing procedure: the dataset was randomy spit into three equa-sized subsets for training, vaidation and testing. SVM, KMP-mse and KMPtanh were then trained on the training set whie the vaidation set was used to choose the optima box-constraint parameter C for SVMs 9, and to do eary stopping (decide on the number N of s.v.) for KMP. And finay the trained modes were tested on the independent test set. This procedure was repeated 50 times over 50 different random spits of the dataset into train/vaidation/test to estimate confidence measures (p-vaues were computed using the resamped t-test (Nadeau and Bengio, 2000)). Tabe 3 reports the average error rate measured on the test sets, and the rounded average number of support vectors found by each agorithm. As can be seen from these experiments, the error rates obtained are comparabe, but the KMP versions appear to require fewer support vectors than SVMs. On these datasets, however (contrary to what we saw previousy on 2D artificia data), KMP-tanh did not seem to give any significant improvement over KMP-mse. Even in other experiments where we added abe noise, KMP-tanh didn t seem to improve generaization performance 10. Tabe 3: Resuts on 4 UCI-MLDB datasets. Again, error rates are not significanty different (vaues in parentheses are the p-vaues for the difference with SVMs), but KMPs require fewer support vectors. Dataset SVM KMP-mse KMP-tanh SVM KMP-mse KMP-tanh error error error #s.v. #s.v. #s.v. Wisc. Cancer 3.41% 3.40% (0.49) 3.49% (0.45) Sonar 20.6% 21.0% (0.45) 26.6% (0.16) Pima Indians 24.1% 23.9% (0.44) 24.0% (0.49) Ionosphere 6.51% 6.87% (0.41) 6.85% (0.40) These were chosen by tria and error using SVMs with a vaidation set and severa vaues of C, and keeping what seemed the best σ, thus this choice was made at the advantage of SVMs (athough they did not seem too sensitive to it) rather than KMP. The vaues used were: 4.0 for Wisconsin Breast Cancer, 6.0 for Pima Indians Diabetes, 2.0 for Ionosphere and Sonar. 9 Vaues of 0.02, 0.05, 0.07, 0.1, 0.5, 1, 2, 3, 5, 10, 20, 100 were tried for C. 10 We do not give a detaied account of these experiments here, as their primary intent was to show that the tanh error function coud have an advantage over squared error in presence of abe noise, but the resuts were inconcusive.

19 7 Concusion We have shown how Matching Pursuit provides a fexibe framework to buid and study aternative kerne-based methods, how it can be extended to use arbitrary differentiabe oss functions and how it reates to SVMs, RBF training procedures, and boosting methods. We have aso provided experimenta evidence that such greedy constructive agorithms can perform as we as SVMs, whie aowing a better contro of the sparsity of the soution, and thus often ead to soutions with far fewer support vectors. It shoud aso be mentioned that the use of a dictionary gives additiona fexibiity, as it can be used, for instance, to mix severa kerne shapes to choose from, simiar to what has been done in (Weston et a., 1999), or to incude other non-kerne functions based on prior knowedge, which opens the way to further research. References Boser, B., Guyon, I., and Vapnik, V. (1992). An agorithm for optima margin cassifiers. In Fifth Annua Workshop on Computationa Learning Theory, pages , Pittsburgh. Chen, S. (1995). Basis Pursuit. PhD thesis, Department of Statistics, Stanford University. Chen, S., Cowan, F., and Grant, P. (1991). Orthogona east squares earning agorithm for radia basis function networks. IEEE Transactions on Neura Networks, 2(2): Davis, G., Maat, S., and Zhang, Z. (1994). Adaptive time-frequency decompositions. Optica Engineering, 33(7): Foyd, S. and Warmuth, M. (1995). Sampe compression, earnabiity, and the vapnik-chervonenkis dimension. Machine Learning, 21(3): Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting agorithm. In Machine Learning: Proceedings of Thirteenth Internationa Conference, pages Friedman, J. (1999). Greedy function approximation: a gradient boosting machine. IMS 1999 Reitz Lecture, February 24, 1999, Dept. of Statistics, Stanford University. Friedman, J., Hastie, T., and Tibshirani, R. (1998). Additive ogistic regression: a statistica view of boosting. Technica report, August 1998, Department of Statistics, Stanford University. Graepe, T., Herbrich, R., and Shawe-Tayor, J. (2000). Generaization error bounds for sparse inear cassifiers. In Thirteenth Annua Conference on Computationa Learning Theory, 2000, page in press. Morgan Kaufmann. Littestone, N. and Warmuth, M. (1986). Reating data compression and earnabiity. Unpubished manuscript. University of Caifornia Santa Cruz. An extended version can be found in (Foyd and Warmuth 95). Maat, S. and Zhang, Z. (1993). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signa Proc., 41(12):

20 Mason, L., Baxter, J., Bartett, P., and Frean, M. (2000). Boosting agorithms as gradient descent. In Soa, S. A., Leen, T. K., and Mer, K.-R., editors, Advances in Neura Information Processing Systems, voume 12, pages MIT Press. Nadeau, C. and Bengio, Y. (2000). Inference for the generaization error. In Soa, S. A., Leen, T. K., and Mer, K.-R., editors, Advances in Neura Information Processing Systems, voume 12, pages MIT Press. Pati, Y., Rezaiifar, R., and Krishnaprasad, P. (1993). Orthogona matching pursuit: Recursive function approximation with appications to waveet decomposition. In Proceedings of the 27 th Annua Asiomar Conference on Signas, Systems, and Computers, pages Poggio, T. and Girosi, F. (1998). A sparse representation for function approximation. Neura Computation, 10(6): Rasmussen, C., Nea, R., Hinton, G., van Camp, D., Ghahramani, Z., Kustra, R., and Tibshirani, R. (1996). The DELVE manua. DELVE can be found at deve. Schapire, R. E., Freund, Y., Bartett, P., and Lee, W. S. (1998). Boosting the margin: A new expanation for the effectiveness of voting methods. The Annas of Statistics, 26(5): Schökopf, B., Sung, K., Burges, C., Girosi, F., Niyogi, P., Poggio, T., and Vapnik, V. (1997). Comparing support vector machines with gaussian kernes to radia basis function cassifiers. IEEE Transactions on Signa Processing, 45: Smoa, A. and Schökopf, B. (2000). Sparse greedy matrix approximation for machine earning. In Langey, P., editor, Internationa Conference on Machine Learning, pages , San Francisco. Morgan Kaufmann. Tipping, M. (2000). The reevance vector machine. In Soa, S. A., Leen, T. K., and Mer, K.-R., editors, Advances in Neura Information Processing Systems, voume 12, pages MIT Press. Vapnik, V. (1995). The Nature of Statistica Learning Theory. Springer, New York. Vapnik, V. (1998). Statistica Learning Theory. Wiey, Lecture Notes in Economics and Mathematica Systems, voume 454. Weston, J., Gammerman, A., Stitson, M., Vapnik, V., Vovk, V., and Watkins, C. (1999). Density estimation using support vector machines. In Schökopf, B., Burges, C. J. C., and Smoa, A. J., editors, Advances in Kerne Methods Support Vector Learning, pages , Cambridge, MA. MIT Press.

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Fast Blind Recognition of Channel Codes

Fast Blind Recognition of Channel Codes Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

Haar Decomposition and Reconstruction Algorithms

Haar Decomposition and Reconstruction Algorithms Jim Lambers MAT 773 Fa Semester 018-19 Lecture 15 and 16 Notes These notes correspond to Sections 4.3 and 4.4 in the text. Haar Decomposition and Reconstruction Agorithms Decomposition Suppose we approximate

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

A Better Way to Pretrain Deep Boltzmann Machines

A Better Way to Pretrain Deep Boltzmann Machines A Better Way to Pretrain Deep Botzmann Machines Rusan Saakhutdino Department of Statistics and Computer Science Uniersity of Toronto rsaakhu@cs.toronto.edu Geoffrey Hinton Department of Computer Science

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES ANDRZEJ DUDEK AND ANDRZEJ RUCIŃSKI Abstract. For positive integers k and, a k-uniform hypergraph is caed a oose path of ength, and denoted by

More information

Emmanuel Abbe Colin Sandon

Emmanuel Abbe Colin Sandon Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

Some Properties of Regularized Kernel Methods

Some Properties of Regularized Kernel Methods Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Physics 8.07: Eectromagnetism II October 7, 202 Prof. Aan Guth LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL

More information

<C 2 2. λ 2 l. λ 1 l 1 < C 1

<C 2 2. λ 2 l. λ 1 l 1 < C 1 Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima

More information

V.B The Cluster Expansion

V.B The Cluster Expansion V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f(q ) = exp ( βv( q )) 1, which is obtained by summing over

More information

Adaptive Regularization for Transductive Support Vector Machine

Adaptive Regularization for Transductive Support Vector Machine Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East

More information

More Scattering: the Partial Wave Expansion

More Scattering: the Partial Wave Expansion More Scattering: the Partia Wave Expansion Michae Fower /7/8 Pane Waves and Partia Waves We are considering the soution to Schrödinger s equation for scattering of an incoming pane wave in the z-direction

More information

Kernel Trick Embedded Gaussian Mixture Model

Kernel Trick Embedded Gaussian Mixture Model Kerne Trick Embedded Gaussian Mixture Mode Jingdong Wang, Jianguo Lee, and Changshui Zhang State Key Laboratory of Inteigent Technoogy and Systems Department of Automation, Tsinghua University Beijing,

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

V.B The Cluster Expansion

V.B The Cluster Expansion V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f( q ) = exp ( βv( q )), which is obtained by summing over

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS ECCM6-6 TH EUROPEAN CONFERENCE ON COMPOSITE MATERIALS, Sevie, Spain, -6 June 04 THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS M. Wysocki a,b*, M. Szpieg a, P. Heström a and F. Ohsson c a Swerea SICOMP

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design Contro Chart For Monitoring Nonparametric Profies With Arbitrary Design Peihua Qiu 1 and Changiang Zou 2 1 Schoo of Statistics, University of Minnesota, USA 2 LPMC and Department of Statistics, Nankai

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information