SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France

Size: px
Start display at page:

Download "SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France"

Transcription

1 SipleMKL Alain Rakotoaonjy LITIS EA 48 Université de Rouen 768 Saint Etienne du Rouvray, France Francis Bach INRIA - Willow project Départeent d Inforatique, Ecole Norale Supérieure 45, Rue d Ul 7523 Paris, France Stéphane Canu LITIS EA 48 INSA de Rouen 768 Saint Etienne du Rouvray, France Yves Grandvalet CNRS/IDIAP Research Institute, Centre du Parc, Av. des Prés-Beudin 2 92 Martigny, Switzerland Abstract alain.rakotoaonjy@insa-rouen.fr francis.bach@ines.org stephane.canu@insa-rouen.fr yves.grandvalet@idiap.ch Multiple kernel learning ais at siultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector achine, an efficient and general ultiple kernel learning (MKL) algorith, based on sei-infinite linear progaing, has been recently proposed. This approach has opened new perspectives since it akes the MKL approach tractable for large-scale probles, by iteratively using existing support vector achine code. However, it turns out that this iterative algorith needs nuerous iterations for converging towards a reasonable solution. In this paper, we address the MKL proble through an adaptive 2-nor regularization forulation that encourages sparse kernel cobinations. Apart fro learning the cobination, we solve a standard SVM optiization proble, where the kernel is defined as a linear cobination of ultiple kernels. We propose an algorith, naed SipleMKL, for solving this MKL proble and provide a new insight on MKL algoriths based on ixed-nor regularization by showing that the two approaches are equivalent. Furtherore, we show how SipleMKL can be applied beyond binary classification, for probles like regression, clustering (one-class classification) or ulticlass classification. Experiental results show that the proposed algorith converges rapidly and that its efficiency copares favorably to other MKL algoriths. Finally, we illustrate the usefulness of MKL for soe regressors based on wavelet kernels and on soe odel selection probles related to ulticlass classification probles. A SipleMKL Toolbox is available at

2 . Introduction During the last few years, kernel ethods, such as support vector achines (SVM) have proved to be efficient tools for solving learning probles like classification or regression (Schölkopf and Sola, 2). For such tasks, the perforance of the learning algorith strongly depends on the data representation. In kernel ethods, the data representation is iplicitly chosen through the so-called kernel K(x,x ). This kernel actually plays two roles: it defines the siilarity between two exaples x and x, while defining an appropriate regularization ter for the learning proble. Let {x i,y i } l i= be the learning set, where x i belongs to soe input space X and y i is the target value for pattern x i. For kernel algoriths, the solution of the learning proble is of the for l f(x) = α i K(x,x i ) + b, () i= where α i and b are soe coefficients to be learned fro exaples, while K(, ) is a given positive definite kernel associated with a reproducing kernel Hilbert space (RKHS) H. In soe situations, a achine learning practitioner ay be interested in ore flexible odels. Recent applications have shown that using ultiple kernels instead of a single one can enhance the interpretability of the decision function and iprove perforances (Lanckriet et al., 24a). In such cases, a convenient approach is to consider that the kernel K(x,x ) is actually a convex cobination of basis kernels: K(x,x ) = M d K (x,x ), with d, = M d =, where M is the total nuber of kernels. Each basis kernel K ay either use the full set of variables describing x or subsets of variables steing fro different data sources (Lanckriet et al., 24a). Alternatively, the kernels K can siply be classical kernels (such as Gaussian kernels) with different paraeters. Within this fraework, the proble of data representation through the kernel is then transferred to the choice of weights d. Learning both the coefficients α i and the weights d in a single optiization proble is known as the ultiple kernel learning (MKL) proble. For binary classification, the MKL proble has been introduced by Lanckriet et al. (24b), resulting in a quadratically constrained quadratic prograing proble that becoes rapidly intractable as the nuber of learning exaples or kernels becoe large. What akes this proble difficult is that it is actually a convex but non-sooth iniization proble. Indeed, Bach et al. (24a) have shown that the MKL forulation of Lanckriet et al. (24b) is actually the dual of a SVM proble in which the weight vector has been regularized according to a ixed (l 2,l )-nor instead of the classical squared l 2 -nor. Bach et al. (24a) have considered a soothed version of the proble for which they proposed a SMO-like algorith that enables to tackle ediu-scale probles. Sonnenburg et al. (26) reforulate the MKL proble of Lanckriet et al. (24b) as a sei-infinite linear progra (SILP). The advantage of the latter forulation is that the algorith addresses the proble by iteratively solving a classical SVM proble with a single kernel, for which any efficient toolboxes exist (Vishwanathan et al., 23, Loosli et al., = 2

3 25, Chang and Lin, 2), and a linear progra whose nuber of constraints increases along with iterations. A very nice feature of this algorith is that is can be extended to a large class of convex loss functions. For instance, Zien and Ong (27) have proposed a ulticlass MKL algorith based on siilar ideas. In this paper, we present another forulation of the ultiple learning proble. We first depart fro the prial forulation proposed by Bach et al. (24a) and further used by Bach et al. (24b) and Sonnenburg et al. (26). Indeed, we replace the ixed-nor regularization by a weighted l 2 -nor regularization, where the sparsity of the linear cobination of kernels is controlled by a l -nor constraint on the kernel weights. This new forulation of MKL leads to a sooth and convex optiization proble. By using a variational forulation of the ixed-nor regularization, we show that our forulation is equivalent to the ones of Lanckriet et al. (24b), Bach et al. (24a) and Sonnenburg et al. (26). The ain contribution of this paper is to propose an efficient algorith, naed SipleMKL, for solving the MKL proble, through a prial forulation involving a weighted l 2 -nor regularization. Indeed, our algorith is siple, essentially based on a gradient descent on the SVM objective value. We iteratively deterine the cobination of kernels by a gradient descent wrapping a standard SVM solver, which is SipleSVM in our case. Our schee is siilar to the one of Sonnenburg et al. (26), and both algoriths iniize the sae objective function. However, they differ in that we use reduced gradient descent in the prial, whereas Sonnenburg et al. s SILP relies on cutting planes. We will epirically show that our optiization strategy is ore efficient, with new evidences confiring the preliinary results reported in Rakotoaonjy et al. (27). Then, extensions of SipleMKL to other supervised learning probles such as regression SVM, one-class SVM or ulticlass SVM probles based on pairwise coupling are proposed. Although it is not the ain purpose of the paper, we will also discuss the applicability of our approach to general convex loss functions. This paper also presents several illustrations of the usefulness of our algorith. For instance, in addition to the epirical efficiency coparison, we also show, in a SVM regression proble involving wavelet kernels, that autoatic learning of the kernels leads to far better perforances. Then we depict how our MKL algorith behaves on soe ulticlass probles. The paper is organized as follows. Section 2 presents the functional settings of our MKL proble and its forulation. Details on the algorith and soe analysis like discussion about convergence and coputational coplexity are given in Section 3. Extensions of our algorith to other SVM probles are discussed in Section 4 while experiental results dealing with coputational coplexity or with coparison with other odel selection ethods are presented in Section 5. A SipleMKL toolbox based on Matlab code is available at fr/enseignants/~arakoto/code/klindex.htl. This toolbox is an extension of our SipleSVM toolbox (Canu et al., 23). 2. Multiple Kernel Learning Fraework In this section, we present our MKL forulation and derive its dual. In the sequel, i and j are indices on exaples, wheras is the kernel index. In order to lighten notations, 3

4 we oit to specify that suations on i and j go fro to l, and that suations on go fro to M. 2. Functional fraework Before entering into the details of the MKL optiization proble, we first present the functional fraework adopted for ultiple kernel learning. Assue K, =,...,M are M positive definite kernels on the sae input space X, each of the being associated with an RKHS H endowed with an inner product,. For any, let d be a non-negative coefficient and H be the Hilbert space derived fro H as follows: endowed with the inner product H = {f f H : f H d < }, f,g H = d f,g. In this paper, we use the convention that x = if x = and otherwise. This eans that, if d = then a function f belongs to the Hilbert space H only if f = H. In such a case, H is restricted to the null eleent of H. Within this fraework, H is a RKHS with kernel K(x,x ) = d K (x,x ) since f H H, f(x) = f( ),K (x, ) = d f( ),d K (x, ) = f( ),d K (x, ) H. Now, if we define H as the direct su of the spaces H, i.e., H = M H, = then, a classical result on RKHS (Aronszajn, 95) says that H is a RKHS of kernel K(x,x ) = M d K (x,x ). = Owing to this siple construction, we have built a RKHS H for which any function is a su of functions belonging to H. In our fraework, MKL ais at deterining the set of coefficients {d } within the learning process of the decision function. The ultiple kernel learning proble can thus be envisioned as learning a predictor belonging to an adaptive hypothesis space endowed with an adaptive inner product. The forthcoing sections explain how we solve this proble. 4

5 2.2 Multiple kernel learning prial proble In the SVM ethodology, the decision function is of the for given in equation (), where the optial paraeters α i and b are obtained by solving the dual of the following optiization proble: in f,b,ξ 2 f 2 H + C i ξ i s.t. y i (f(x i ) + b) ξ i i ξ i i. In the MKL fraework, one looks for a decision function of the for f(x) + b = f (x) + b, where each function f belongs to a different RKHS H associated with a kernel K. According to the above functional fraework and inspired by the ultiple soothing splines fraework of Wahba (99, chap. ), we propose to address the MKL SVM proble by solving the following convex (see proof in appendix) proble, which we will be referred to as the prial MKL proble: in {f },b,ξ,d s.t. f 2 H 2 d + C i y i f (x i ) + y i b ξ i ξ i ξ i i d =, d, where each d controls the squared nor of f in the objective function. The saller d is, the soother f (as easured by f H ) should be. When d =, f H has also to be equal to zero to yield a finite objective value. The l -nor constraint on the vector d is a sparsity constraint that will force soe d to be zero, thus encouraging sparse basis kernel expansions. 2.3 Connections with ixed-nor regularization forulation of MKL The MKL forulation introduced by Bach et al. (24a) and further developed by Sonnenburg et al. (26) consists in solving an optiization proble expressed in a functional for as in {f},b,ξ s.t. ( ) 2 f H + C ξ i 2 i y i f (x i ) + y i b ξ i i ξ i i. Note that the objective function of this proble is not sooth since f H is not differentiable at f =. However, what akes this forulation interesting is that the ixed-nor penalization of f = f is a soft-thresholding penalizer that leads to a sparse solution, for which the algorith perfors kernel selection (Bach, 27). We have stated in the previous section that our proble should also lead to sparse solutions. In the following, we show that the forulations (2) and (3) are equivalent. i (2) (3) 5

6 For this purpose, we siply show that the variational forulation of the ixed-nor regularization is equal to the weighted 2-nor regularization, (which is a particular case of a ore general equivalence proposed by Micchelli and Pontil (25)) i.e., by Cauchy-Schwartz inequality, for any vector d in the siplex: ( ) 2 f H = ( ( f H d /2 d /2 f 2 H d ) 2 ) ( ) ( d = f 2 H d which leads to ( ) f 2 2 in d P H = f H. (4), d= d Moreover, equality in the Cauchy-Schwartz inequality, leads to the following optial vector d: d = f H / f q Hq. (5) q ) Hence, owing to this variational forulation, the non-sooth ixed-nor objective function of proble (3) has been turned into a sooth objective function in proble (2). Although the nuber of variables has increased, we will see that this proble can be solved ore efficiently. 2.4 The MKL dual proble The dual proble is a key point for deriving MKL algoriths and for studying their convergence properties. Since our prial proble (2) is equivalent to the one of Bach et al. (24a), they lead to the sae dual. However, our prial forulation being convex and differentiable, it provides a siple derivation of the dual, that does not use conic duality. The Lagrangian of proble (2) is L = f 2 H 2 d + C ξ i + ( ) α i ξ i y i f (x i ) y i b ν i ξ i i i i ( ) +λ d η d, (6) where α i and ν i are the Lagrange ultipliers of the constraints related to the usual SVM proble, whereas λ and η are associated to the constraints on d. When setting to zero the gradient of the Lagrangian with respect to the prial variables, we get the following (a) (b) f ( ) = d i α i y i = i (c) C α i ν i =, i α i y i K (,x i ), (7) (d) 2 f 2 H d 2 + λ η =,. 6

7 We note again here that f ( ) has to go to if the coefficient d vanishes. Plugging these optiality conditions in the Lagrangian gives the dual proble ax α i λ α i,λ i s.t. α i y i = i (8) α i C i α i α j y i y j K (x i,x j ) λ,. 2 i,j This dual proble is difficult to optiize due to the last constraint. This constraint ay be oved to the objective function, but then, the latter becoes non-differentiable causing new difficulties (Bach et al., 24a). Hence, in the forthcoing section, we propose an approach based on the iniization of the prial. In this fraework, we benefit of the proble differentiability that allows an efficient derivation of approxiate prial solutions, whose accuracy will be onitored by the duality gap. 3. Algorith for solving the MKL prial proble One possible approach for solving proble (2) is to use the alternate optiization algorith applied by Grandvalet and Canu (999, 23) another contexts. In the first step, proble (2) is optiized with respect to f, b and ξ, considering that d is fixed. Then, in the second step, the weight vector d is updated to decrease the objective function of proble (2), with f, b and ξ being fixed. In Section 2.3, we showed that the second step can be carried out in closed for. However, this approach lacks convergence guarantees and ay lead to nuerical probles, in particular when soe eleents of d approach zero (Grandvalet, 998). Note that these nuerical probles can be handle by introducing a perturbed version of the alternate algorith as shown by Argyriou et al. (28). Instead of using an alternate optiization algorith, we prefer to consider here the following constrained optiization proble: where in d J(d) such that in {f},b,ξ J(d) = s.t. M d =, d (9) = f 2 H 2 d + C i y i f (x i ) + y i b ξ i ξ i i. We show below how to solve proble (9) on the siplex by a siple gradient ethod. We will first note that the objective function J(d) is actually an optial SVM objective value. We will then discuss the existence and coputation of the gradient of J( ), which is at the core of the proposed approach. ξ i i (). Note that Bach et al. (24a) forulation differs slightly, in that the kernels are weighted by soe pre-defined coefficients that were not considered here. 7

8 3. Coputing the optial SVM value and its derivatives The Lagrangian of proble () is identical to the first line of equation (6). By setting to zero the derivatives of this Lagrangian according to the prial variables, we get conditions (7) (a) to (c), fro which we derive the associated dual proble ax α i α j y i y j d K (x i,x j ) + α 2 i,j i with α i y i = i C α i i, α i () which is identified as the standard SVM dual forulation using the cobined kernel K(x i,x j ) = d K (x i,x j ). Function J(d) is defined as the optial objective value of proble (). Because of strong duality, J(d) is also the objective value of the dual proble: J(d) = α i 2 α j y iy j d K (x i,x j ) + α i, (2) i i,j where α axiizes (). Note that the objective value J(d) can be obtained by any SVM algorith. Our ethod can thus take advantage of any progress in single kernel algoriths. In particular, if the SVM algorith we use is able to handle large-scale probles, so will our MKL algorith. Thus, the overall coplexity of SipleMKL is tied to the one of the single kernel SVM algorith. Fro now on, we assue that each Gra atrix (K (x i,x j )) i,j is positive definite, with all eigenvalues greater than soe η > (to enforce this property, a sall ridge ay be added to the diagonal of the Gra atrices). This iplies that, for any adissible value of d, the dual proble is strictly concave with convexity paraeter η (Learéchal and Sagastizabal, 997). In turn, this strict concavity property ensures that α is unique, a characteristic that eases the analysis of the differentiability of J( ). The existence and coputation of the derivatives of optial value functions such as J( ) have been largely discussed by Bonnans and Shapiro (998). For our purpose, the appropriate reference, which is Theore 4. in Bonnans and Shapiro (998), which has already been applied by Chapelle et al. (22) for the special case of squared-hinge loss SVM. This theore is reproduced here in the appendix for self-containedness. In a nutshell, this theore says that differentiability of J(d) is ensured by the unicity of α, and by the differentiability of the objective function that gives J(d). Furtherore, the derivatives of J(d) can be coputed as if α were not to depend on d. Thus, by siple differentiation of the dual function () with respect to d, we have: J d = 2 α i α jy i y j K (x i,x j ). (3) i,j We will see in the sequel that the applicability of this theore can be extended to other SVM probles. 8

9 3.2 Reduced gradient algorith The optiization proble we have to deal with in (9) is a non-linear objective function with constraints over the siplex. With our positivity assuption on the kernel atrices, J( ) is convex and differentiable with Lipschitz gradient (Learéchal and Sagastizabal, 997). The approach we use for solving this proble is a reduced gradient ethod, which converges for such functions (Luenberger, 984). Once the gradient of J(d) is coputed, d is updated by using a descent direction ensuring that the equality constraint and the non-negativity constraints on d are satisfied. We handle the equality constraint by coputing the reduced gradient (Luenberger, 984, Chap. ). Let d µ be a non-zero entry of d, the reduced gradient of J(d), denoted red J, has coponents: [ red J] = J J µ, and [ red J] µ = ( J J ). d d µ d d µ µ We chose µ to be the index of the largest coponent of vector d, for better nuerical stability (Bonnans, 26). The positivity constraints have also to be taken into account in the descent direction. Since we want to iniize J( ), red J is a descent direction. However, if there is an index such that d = and [ red J] >, using this direction would violate the positivity constraint for d. Hence, the descent direction for that coponent is set to. Eventually, the descent direction for updating d is: D = if d = and J d J d µ > J d + J d µ if d > and µ ( J J ) for = µ. d ν d µ g µ,d ν> and the updating schee is the following: d d + γd. Our updating schee, detailed in Algorith, goes one step beyond: once a descent direction D has been coputed, we first look for the axial adissible step size in that direction and check whether the objective value decreases or not. The axial adissible step size corresponds to a coponent, say d ν, set to zero. If the objective value decreases, d is updated, we set D ν = and noralize D to coply with the equality constraint. This procedure is repeated until the objective value stops decreasing. At this point, we look for the optial step size γ, which is deterined by using a one-diensional line search, with proper stopping criterion, such as Arijo s rule, to ensure global convergence. In this algorith, the gradient of the cost function is not coputed after each update of the weight vector d. Instead, we take advantage of an easily updated descent direction as long as decrease in the objective value is possible. We will see in the nuerical experients that this approach saves a substantial aount of coputation tie. Note that we have also investigated a plain gradient projection algorith (Bertsekas, 999, Chap 2.3) for solving 9

10 Algorith SipleMKL algorith set d = M for =,...,M while stopping criterion not et do solve the classical SVM proble with K = d K copute J d for =,...,M and descent direction D set µ = argax d, ν = argin d /D and γ ax = d ν /D ν { D <} while J(d + γ ax D) < J(d) do d d + γ ax D D µ D µ D ν, D ν, copute ν = argin d /D and γ ax = d ν /D ν { D <} end while Search for the optial step γ [,γ ax ] d d + γd end while proble (9). The resulting update schee was epirically observed to to be slightly less efficient than the proposed approach, and we will not report its results. Also, note that the projection and the line-search techniques involve several querying of the objective function and thus it requires the coputation of several single kernel SVM with sall variations of d. This ay be very costly but it can be speeded up by initializing the SVM algorith with previous values of α (DeCoste and Wagstaff., 2). The above described steps of the algorith are perfored until a stopping criterion is et. This stopping criterion can be either based on the duality gap, the KKT conditions, the variation of d between two consecutive steps or, even ore siply, on a axial nuber of iterations. Our ipleentation, based on the duality gap, is detailed in the forthcoing section. 3.3 Optiality conditions In a convex constrained optiization algorith like the one we are considering, we have the opportunity to check for proper optiality conditions such as the KKT conditions or the duality gap. Since our forulation of the MKL proble is convex, convergence can be onitored through the duality gap (the difference between prial and dual objective values), which should be zero at the optiu. Fro the prial and dual objectives provided respectively in (2) and (8), the MKL duality gap is DualGap = J(d ) i α i + 2 ax α i α j y iy j K (x i,x j ), i,j where d and {α i } are optial prial and dual variables, and J(d ) depends iplicitly on optial prial variables {f }, b and {ξ i }. If J(d ) has been obtained through the dual proble (), then this duality gap can also be related to the one of the single kernel SVM

11 algorith DG SVM. Indeed, substituting J(d) with its expression in (2) yields DualGap = DG SVM α i α 2 jy i y j d K (x i,x j ) + 2 ax α i α jy i y j K (x i,x j ). i,j i,j Hence, the MKL duality gap can be obtained with a sall additional coputational cost copared to the SVM duality gap. In iterative procedures, it is coon to stop the algorith when the optiality conditions are respected up to a tolerance threshold ε. Obviously, SipleMKL has no ipact on DG SVM, hence, one ay assue, as we did here, that DG SVM needs not to be onitored. Consequently, we terinate the algorith when ax α iα jy i y j K (x i,x j ) α i α jy i y j d K (x i,x j ) ε. (4) i,j i,j For soe of the other MKL algoriths that will be presented in in Section 4, the dual function ay be ore difficult to derive. Hence, it ay be easier to rely on approxiate KKT conditions as a stopping criterion. For the general MKL proble (9), the first order optiality conditions are obtained through the KKT conditions: J d + λ η = η d =, where λ and {η } are the Lagrange ultipliers of respectively the equality and inequality constraints of (9). These KKT conditions iply J d = λ if d > J d λ if d =. However, as Algorith is not based on the Lagrangian forulation of proble (9), λ is not coputed. Hence, we derive approxiate necessary optiality conditions to be used for terination criterion. Let s define dj in and dj ax as dj in = in {d d >} J d and dj ax = ax {d d >} J d, then, the necessary optiality conditions are approxiated by the following terination conditions: dj in dj ax ε and J d dj ax if d = In other words, we consider to be at the optiu when the gradient coponents for all positive d lie in a ε-tube and when all gradient coponents for vanishing d are above this tube. Note that these approxiate necessary optiality conditions are available right away for any differentiable objective function J(d).

12 Figure : Illustrating three iterations of the SILP algorith and a gradient descent algorith for a one-diensional proble. This diensionality is not representative of the MKL fraework, but our ai is to illustrate the typical oscillations of cutting planes around the optial solution (with iterates d to d 3 ). Note that coputing an affine lower bound at a given d requires a gradient coputation. Provided the step size is chosen correctly, gradient descent converges directly towards the optial solution without overshooting (fro d to d ). 3.4 Cutting Planes, Steepest Descent and Coputational Coplexity As we stated in the introduction, several algoriths have been proposed for solving the original MKL proble defined by Lanckriet et al. (24b). All these algoriths are based on equivalent forulations of the sae dual proble; they all ai at providing a pair of optial vectors (d, α). In this subsection, we contrast SipleMKL with its closest relative, the SILP algorith of Sonnenburg et al. (25, 26). Indeed, fro an ipleentation point of view, the two algoriths are alike, since they are wrapping a standard single kernel SVM algorith. This feature akes both algoriths very easy to ipleent. They however differ in coputational efficiency, because the kernel weights d are optiized in quite different ways, as detailed below. Let us first recall that our differentiable function J(d) is defined as: ax α i α j y i y j d K (x i,x j ) + α i α 2 J(d) = i,j i with α i y i =, C α i i, i and both algoriths ai at iniizing this differentiable function. However, using a SILP approach in this case, does not take advantage of the soothness of the objective function. The SILP algorith of Sonnenburg et al. (26) is a cutting plane ethod to iniize J with respect to d. For each value of d, the best α is found and leads to an affine lower bound on J(d). The nuber of lower bounding affine functions increases as ore (d, α) pairs are coputed, and the next candidate vector d is the iniizer of the current lower bound on 2

13 J(d), that is, the axiu over all the affine functions. Cutting planes ethod do converge but they are known for their instability, notably when the nuber of lower-bounding affine functions is sall: the approxiation of the objective function is then loose and the iterates ay oscillate (Bonnans et al., 23). Our steepest descent approach, with the proposed line search, does not suffer fro instability since we have a differentiable function to iniize. Figure illustrates the behaviour of both algoriths in a siple case, with oscilations for cutting planes and direct convergence for gradient descent. Section 5 evaluates how these oscillations ipact on the coputational tie of the SILP algorith on several genuine exaples. These experients show that our algorith needs less costly gradient coputations. Conversely, the line search in the gradient base approach requires ore SVM retrainings in the process of querying the objective function. However, the coputation tie per SVM training is considerably reduced, since the gradient based approach produces estiates of d on a sooth trajectory, so that the previous SVM solution provides a good guess for the current SVM training. In SILP, with the oscillating subsequent approxiations of d, the benefit of war-start training severely decreases. 3.5 Convergence Analysis In this paragraph, we briefly discuss the convergence of the algorith we propose. We first suppose that proble () is always exactly solved, which eans that the duality gap of such proble is. With such conditions, the gradient coputation in (3) is exact and thus our algorith perfors reduced gradient descent on a continuously differentiable function J( ) (reeber that we have assued that the kernel atrices are positive definite) defined on the siplex {d d =,d }, which does converge to the global iniu of J (Luenberger, 984). However, in practice, proble () is not solved exactly since ost SVM algoriths will stop when the duality gap is saller than a given ε. In this case, the convergence of our projected gradient ethod is no ore guaranteed by standard arguents. Indeed, the output of the approxiately solved SVM leads only to an ε-subgradient (Bonnans et al., 23, Bach et al., 24a). This situation is ore difficult to analyze and we plan to address it thoroughly in future work (see for instance D Aspreont (26) for an exaple of such analysis in a siilar context). 4. Extensions In this section, we discuss how the algorith we propose can be siply extended to other SVM algoriths like SVM regression, one-class SVM or pairwise ulticlass SVM algoriths. More generally, we will discuss other loss functions that can be used within our MKL algoriths. 4. Extensions to other SVM Algoriths The algorith we described in the previous section focuses on binary classification SVMs, but it is worth noting that our MKL algorith can be extended to other SVM algoriths with only little changes. For SVM regression with the ε-insensitive loss, or clus- 3

14 tering with the one-class soft argin loss, the proble only changes in the definition of the objective function J(d) in (). For SVM regression (Vapnik et al., 997, Schölkopf and Sola, 2), we have J(d) = in f,b,ξ i 2 s.t. y i d f 2 H + C i f (x i ) b ε + ξ i f (x i ) + b y i ε + ξi ξ i,ξ i i, (ξ i + ξ i ) i i (5) and for one-class SVMs (Schölkopf and Sola, 2), we have: J(d) = in f 2 H f,b,ξ i 2 d + ξ i b νl i s.t. f (x i ) b ξ i ξ i. Again, J(d) can be defined according to the dual functions of these two optiization probles, which are respectively ax (β i α i )y i ε (β i + α i ) (β i α i )(β j α j ) d K (x i,x j ) α,β 2 i i i,j J(d) = with (β i α i ) = i α i, β i C, i, (7) and ax α i α j d K (x i,x j ) α 2 i,j J(d) = with α i i (8) νl α i =, i where {α i } and {β i } are Lagrange ultipliers. Then, as long as J(d) is differentiable, a property strictly related to the strict concavity of its dual function, our descent algorith can still be applied. The ain effort for the extension of our algorith is the evaluation of J(d) and the coputation of its derivatives. Siilarly to the binary classification SVM, J(d) can be coputed by eans of efficient off-the-shelf SVM solvers and the gradient of J(d) is easily obtained through the dual probles. For SVM regression, we have: J d = 2 (6) (βi α i )(βj α j)k (x i,x j ), (9) i,j 4

15 and for one-class SVM, we have: J d = 2 α i α j K (x i,x j ), (2) i,j where α i and β i are the optial values of the Lagrange ultipliers. These exaples illustrate that extending SipleMKL to other SVM probles is rather straighforward. This observation is valid for other SVM algoriths (based for instance on the ν paraeter, a squared hinge loss or squared-ε tube) that we do not detail here. Again, our algorith can be used provided J(d) is differentiable, by plugging in the algorith the function that evaluates the objective value J(d) and its gradient. Of course, the duality gap ay be considered as a stopping criterion if it can be coputed. 4.2 Multiclass Multiple Kernel Learning With SVMs, ulticlass probles are custoarily solved by cobining several binary classifiers. The well-known one-against-all and one-against-one approaches are the two ost coon ways for building a ulticlass decision function based on pairwise decision functions. Multiclass SVM ay also be defined right away as the solution of a global optiization proble (Weston and Watkins, 999, Craer and Singer, 2), that ay also be addressed with structured-output SVM (Tsochantaridis et al., 25). Very recently, an MKL algorith based on structured-output SVM has been proposed by Zien and Ong (27). This work extends the work of Sonnenburg et al. (26) to ulticlass probles, with an MKL ipleentation still based on a QCQP or SILP approach. Several works have discussed the copared perforances of ulticlass SVM algoriths (Duan and Keerthi, 25, Hsu and Lin, 22, Rifkin and Klautau, 24). In this subsection, we do not deal with this aspect; we explain how SipleMKL can be extended to pairwise SVM ulticlass ipleentations. The proble of applying our algorith to structuredoutput SVM will be briefly discussed later. Suppose we have a ulticlass proble with P classes. For a one-against-all ulticlass SVM, we need to train P binary SVM classifiers, where the p-th classifier is trained by considering all exaples of class p as positive exaples while all other exaples are considered negative. For a one-against-one ulticlass proble, P(P )/2 binary SVM classifiers are built fro all pairs of distinct classes. Our ulticlass MKL extension of SipleMKL differs fro the binary version only in the definition of a new cost function J(d). As we now look for the cobination of kernels that jointly optiizes all the pairwise decision functions, the objective function we want to optiize according to the kernel weights {d } is: J(d) = p P J p (d), where P is the set of all pairs to be considered, and J p (d) is the binary SVM objective value for the classification proble pertaining to pair p. Once the new objective function is defined, the lines of Algorith still apply. The gradient of J(d) is still very siple to obtain, since owing to linearity, we have: J d = 2 α i,p α j,p y iy j K (x i,x j ), (2) p P i,j 5

16 where α j,p is the Lagrange ultiplier of the j-th exaple involved in the p-th decision function. Note that those Lagrange ultipliers can be obtained independently for each pair. The abovedescribed approach ais at finding the cobination of kernels that jointly optiizes all binary classification probles: this one set of features should axiize the su of argins. Another possible and straightforward approach consists in running independently SipleMKL for each classification task. However, this choice is likely to result in as any cobinations of kernels as there are binary classifiers. 4.3 Other loss functions Multiple kernel learning has been of great interest and since the seinal work of Lanckriet et al. (24b), several works on this topic have flourished. For instance, ultiple kernel learning has been transposed to least-square fitting and logistic regression (Bach et al., 24b). Independently, several authors have applied ixed-nor regularization, such as the additive spline regression odel of Grandvalet and Canu (999). This type of regularization, which is now known as the group lasso, ay be seen as a linear version of ultiple kernel learning (Bach, 27). Several algoriths have been proposed for solving the group lasso proble. Soe of the are based on projected gradient or on coordinate descent algorith. However, they all consider the non-sooth version of the proble. We previously entioned that Zien and Ong (27) have proposed an MKL algorith based on structured-output SVMs. For such proble, the loss function, which differs fro the usual SVM hinge loss, leads to an algorith based on cutting planes instead of the usual QP approach. Provided the gradient of the objective value can be obtained, our algorith can be applied to group lasso and structured-output SVMs. The key point is whether the theore of Bonnans et al. (23) can be applied or not. Although we have not deeply investigated this point, we think that any probles coply with this requireent, but we leave these developents for future work. 4.4 Approxiate regularization path SipleMKL requires the setting of the usual SVM hyperparaeter C, which usually needs to be tuned for the proble at hand. For doing so, a practical and useful technique is to copute the so-called regularization path, which describes the set of solutions as C varies fro to. Exact path following techniques have been derived for soe specific probles like SVMs or the lasso (Hastie et al., 24, Efron et al., 24). Besides, regularization paths can be sapled by predictor-corrector ethods (Rosset, 24, Bach et al., 24b). For odel selection purposes, an approxiation of the regularization path ay be sufficient. This approach has been applied for instance by Koh et al. (27) in regularized logistic regression. Here, we copute an approxiate regularization path based on a war-start technique. Suppose, that for a given value of C, we have coputed the optial (d,α ) pair; the idea of a war-start is to use this solution for initializing another MKL proble with a different value of C. In our case, we iteratively copute the solutions for decreasing values of C 6

17 (note that α has to be odified to be a feasible initialization of the ore constrained SVM proble). The war-start technique can be used as is, but soe siple tricks can be used for further iproving the initialization. For instance, suppose we have coputed the pair (d,α ) that optiizes an MKL proble for a given C. For a different hyperparaeter value, say C, the first step our algorith (see Algorith ) consists in coputing a SVM solution with paraeter C and with a single kernel obtained as the weighted according to d su of kernels. Hence this step can be solved by coputing the regularization path of an SVM with fixed kernel. Then, once the new optial value of α (corresponding to the hyperparaeter C ) has been coputed, the kernel weights can be updated accordingly. Indeed, according to equations (5) and (7), we have:, d = f H with f ( ) = d α i f y ik (x i, ), H i which corresponds to the optial weight values if the new α are also optial. This is probably not the case, but it leads to another possible and probably better initialization values for our war-start techniques. 5. Nuerical experients In this experiental section, we essentially ai at illustrating three points. The first point is to show that our gradient descent algorith is efficient. This is achieved by binary classification experients, where SipleMKL is copared to the SILP approach of Sonnenburg et al. (26). Then, we illustrate the usefulness of a ultiple kernel learning approach in the context of regression. The exaples we use are based on wavelet-based regression in which the ultiple kernel learning fraework naturally fits. The final experient ais at evaluating the ultiple kernel approach in a odel selection proble for soe ulticlass probles. 5. Coputation tie The ai of this first set of experients is to assess the running ties of SipleMKL. 2 First, we copare with SILP regarding the tie required for coputing a single solution of MKL with a given C hyperparaeter. Then, we copute an approxiate regularization path by varying C values. We finally provide hints on the expected coplexity of SipleMKL, by easuring the growth of running tie as the nuber of exaples or kernels increases. 5.. Tie needed for reaching a single solution In this first benchark, we put SipleMKL and SILP side by side, for a fixed value of the hyperparaeter C (C = ). This procedure, which does not take into account a proper odel selection procedure, is not representative of the typical use of SVMs. It is however relevant for the purpose of coparing algorithic issues. 2. All the experients have been run on a Pentiu D-3 GHz with 3 GB of RAM. 7

18 The evaluation is ade on five datasets fro the UCI repository: Liver, Wpbc, Ionosphere, Pia, Sonar (Blake and Merz, 998). The candidate kernels are: Gaussian kernels with different bandwidths σ, on all variables and on each single variable; polynoial kernels of degree to 3, again on all and each single variable. All kernel atrices have been noralized to unit trace, and are precoputed prior to running the algoriths. Both SipleMKL and SILP wrap an SVM dual solver based on SipleSVM, an active constraints written in Matlab (Canu et al., 23). The descent procedure of SipleMKL is also ipleented in Matlab, whereas the linear prograing involved in SILP is ipleented in the publicly available toolbox LPSOLVE (Berkelaar et al., 24). For a fair coparison, we use the sae stopping criterion for both algoriths. They halt when, either the duality gap is lower than., or the nuber of iterations exceeds 2. Quantitatively, the displayed results differ fro the preliinary version of this work, where the stopping criterion was based on the stabilization of the weights, but they are qualitatively siilar (Rakotoaonjy et al., 27). For each dataset, the algoriths were run 2 ties with different train and test sets (7% of the exaples for training and 3% for testing). Training exaples were noralized to zero ean and unit variance. In Table, we report different perforance easures: accuracy, nuber of selected kernels and running tie. As the latter is ainly spent in querying the SVM solver and in coputing the gradient of J with respect to d, the nuber of calls to these two routines is also reported. Both algoriths are nearly identical in perforance accuracy. Their nuber of selected kernels are of sae agnitude, although SipleMKL tends to select to 2% ore kernels. As both algoriths address the sae convex optiization proble, with convergent ethods starting fro the sae initialization, the observed differences are only due to the inaccuracy of the solution when the stopping criterion is et. Hence, the trajectories chosen by each algorith for reaching the solution, detailed in Section 3.4, explain the differences in the nuber of selected kernels. The updates of d based on the descent algorith of SipleMKL are rather conservative (sall steps departing fro /M for all d ), whereas the oscilations of cutting planes are likely to favor extree solutions, hitting the edges of the siplex. This explanation is corroborated by Figure 2, which copares the behavior of the d coefficients trough tie. The instability of SILP is clearly visible, with very high oscillations in the firsts iterations and a noticeable residual noise in the long run. In coparison, the trajectories for SipleSVM are uch soother. If we now look at the overall difference in coputation tie reported in Table, clearly, on all data sets, SipleSVM is faster than SILP, with an average gain factor of about 5. Furtherore, the larger the nuber of kernels is, the larger the speed gain we achieve. Looking at the last colun of Table, we see that the ain reason for iproveent is that SipleMKL converges in fewer iterations (that is, gradient coputations). It ay see surprising that this gain is not conterbalanced by the fact that SipleMKL requires any 8

19 Table : Average perforance easures for the two MKL algoriths. Liver l = 24 M = 9 Algorith # Kernel Accuracy Tie (s) # SVM eval # Gradient eval SILP.6 ± ± ± ± ± 2 SipleMKL.2 ± ± ± ± ± 26 Pia l = 538 M = 7 Algorith # Kernel Accuracy Tie (s) # SVM eval # Gradient eval SILP.6 ± ± ± ± ± 3 SipleMKL 4.7 ± ± ± 3 34 ± ± 4.8 Ionosphere l = 246 M = 442 Algorith # Kernel Accuracy Tie (s) # SVM eval # Gradient eval SILP 2.6 ± ± ± 5 43 ± ± 53 SipleMKL 23.6 ± ± ± 46 7 ± ± 25 Wpbc l = 36 M = 442 Algorith # Kernel Accuracy Tie (s) # SVM eval # Gradient eval SILP 3.7 ± ± ± ± ± 44 SipleMKL 5.8 ± ± ± ± ± Sonar l = 46 M = 793 Algorith # Kernel Accuracy Tie (s) # SVM eval # Gradient eval SILP 33.5 ± ± ± ± ± 87 SipleMKL 36.7 ± ± ± ± 56 5 ± 66 d k SipleMKL d k SipleMKL d k SILP Iterations Pia d k SILP Iterations Ionosphere Figure 2: Evolution of the five largest weights d for SipleMKL and SILP; left row: Pia; right row: Ionosphere. 9

20 3.5 x 4 SipleMKL SILP 5 SipleMKL SILP Objective value Objective value Iterations Pia Iterations Ionosphere Figure 3: Evolution of the objective values for SipleSVM and SILP; left row: Pia; right row: Ionosphere. ore calls to the SVM solver (on average, about 4 ties). As we stated in Section 3.4, when the nuber of kernels is large, coputing the gradient ay be expensive copared to SVM retraining with war-start techniques. To understand why, with this large nuber of calls to the SVM solver, SipleMKL is still uch faster than SILP, we have to look back at Figure 2. On the one hand, the large variations in subsequents d values for SILP, entail that subsequent SVM probles are not likely to have siilar solutions: a war-start call to the SVM solver does not help uch. On the other hand, with the sooth trajectories of d in SipleMKL, the previous SVM solution is often a good guess for the current proble: a war-start call to the SVM solver results in uch less coputation than a call fro scratch. To end this first series of experients, Figure 3 depicts the evolution of the objective function for the data sets that were used in Figure 2. Besides the fact that SILP needs ore iterations for achieving a good approxiation of the final solution, it is worth noting that the objective values rapidly reach their steady state while still being far fro convergence, when d values are far fro being settled. Thus, onitoring objective values is not suitable to assess convergence Tie needed for getting an approxiate regularization path In practice, the optial value of C is unknown, and one has to solve several SVM probles, spanning a wide range of C values, before choosing a solution according to soe odel selection criterion like the cross-validation error. Here, we further pursue the coparison of the running ties of SipleMKL and SILP, in a series of experients that include the search for a sensible value of C. In this new benchark, we use the sae data sets as in the previous experients, with the sae kernel settings. The task is only changed in the respect that we now evaluate the running ties needed by both algoriths to copute an approxiate regularization path. 2

21 nuber of selected kernels C nuber of selected kernels C d k C Pia d k C Wpbc Figure 4: Regularization paths for d and the nuber of selected kernels versus C; left row: Pia; right row: Wpbc. For both algoriths, we use a siple war-start technique, which consists in using the optial solutions {d } and {α i } obtained for a given C to initialize a new MKL proble with C + C (DeCoste and Wagstaff., 2). As described in Section 4.4, we start fro the largest C and then approxiate the regularization path by decreasing its value. The set of C values is obtained by evenly sapling the interval [.,] on a logarithic scale. Figure 4 shows the variations of the nuber of selected kernels and the values of d along the regularization path for the Pia and Wpbc datasets. The nuber of kernels is not a onotone function of C: for sall values of C, the nuber of kernels is soewhat constant, then, it rises rapidly. There is a sall overshooth before reaching a plateau corresponding to very high values of C. This trend is siilar for the nuber of leading leading ters in the kernel weight vector d. Both phenoenon were observed consistently over the datasets we used. Table 2 displays the average coputation tie (over runs) required for building the approxiate regularization path. As previously, SipleMKL is ore efficient than SILP, with a gain factor increasing with the nuber of kernels in the cobination. The range 2

22 Table 2: Average coputation tie (in seconds) for getting an approxiate regularization path. For the Sonar data set, SILP was extreely slow, so that regularization path was coputed only once. Dataset SipleMKL SILP Ratio Liver 48 ± ± Pia 3 ± ± Ionosphere 29 ± ± Wpbc 88 ± 6 24 ± Sonar 625 ± (*) 243 of gain factors, fro 5.9 to 23, is even ore ipressive than in the previous benchark. SipleMKL benefits fro the continuity of solutions along the regularization path, whereas SILP does not take advantage of war starts. Even provided with a good initialization, it needs any cutting planes to stabilize More on SipleMKL running ties Here, we provide an epirical assessent of the expected coplexity of SipleMKL on different data sets fro the UCI repository. We first look at the situation where kernel atrices can be pre-coputed and stored in eory, before reporting experients where the eory are too high, leading to repeated kernel evaluations. In a first set of experients, we use Gaussian kernels, coputed on rando subsets of variables and with rando width. These kernels are precoputed and stored in eory, and we report the average CPU running ties obtained fro 2 runs differing in the rando draw of training exaples. The stopping criterion is the sae as in the previous section: a relative duality gap less than ε =.. The first two rows of Figure 5 depicts the growth of coputation tie as the nuber of kernel increases. We observe a nearly linear trend for the four learning probles. This growth rate could be expected considering the linear convergence property of gradient techniques, but the absence of overhead is valuable. The last row of Figure 5 depicts the growth of coputation tie as the nuber of exaples increases. Here, the nuber of kernels is set to. In these plots, the observed trend is clearly superlinear. Again, this trend could be expected, considering that SVM expected training ties are superlinear in the nuber of training exaples. As we already entioned, the coplexity of SipleMKL is tightly linked to the one of SVM training (for soe exaples of single kernel SVM running tie, one can refer to the work of Loosli and Canu (27)). When all the kernels used for MKL cannot be stored in eory, one can resort to a decoposition ethod. Table 3 reports the average coputation ties, over runs, in this ore difficult situation. The large-scale SVM schee of Joachis (999) has been ipleented, with basis kernels recoputed whenever needed. This approach is coputationally expensive but goes with no eory liit. For these experients, the stopping criterion is 22

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A Sequential Dual Method for Large Scale Multi-Class Linear SVMs

A Sequential Dual Method for Large Scale Multi-Class Linear SVMs A Sequential Dual Method for Large Scale Multi-Class Linear SVMs Kai-Wei Chang Dept. of Coputer Science National Taiwan University Taipei 106, Taiwan b92084@csie.ntu.edu.tw S. Sathiya Keerthi Yahoo! Research

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization The Thirty-Second AAAI Conference on Artificial Intelligence AAAI-8) Inexact Proxial Gradient Methods for Non-Convex and Non-Sooth Optiization Bin Gu, De Wang, Zhouyuan Huo, Heng Huang * Departent of Electrical

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks 1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

An Algorithm for Posynomial Geometric Programming, Based on Generalized Linear Programming

An Algorithm for Posynomial Geometric Programming, Based on Generalized Linear Programming An Algorith for Posynoial Geoetric Prograing, Based on Generalized Linear Prograing Jayant Rajgopal Departent of Industrial Engineering University of Pittsburgh, Pittsburgh, PA 526 Dennis L. Bricer Departent

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Optical Properties of Plasmas of High-Z Elements

Optical Properties of Plasmas of High-Z Elements Forschungszentru Karlsruhe Techni und Uwelt Wissenschaftlishe Berichte FZK Optical Properties of Plasas of High-Z Eleents V.Tolach 1, G.Miloshevsy 1, H.Würz Project Kernfusion 1 Heat and Mass Transfer

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information