Chapter 1 Decomposition methods for Support Vector Machines

Size: px
Start display at page:

Download "Chapter 1 Decomposition methods for Support Vector Machines"

Transcription

1 Chapter 1 Decomposition methods for Support Vector Machines Support Vector Machines (SVM) are widey used as a simpe and efficient too for inear and noninear cassification as we as for regression probems. The basic training principe of SVM, motivated by the statistica earning theory, is that the expected cassification error for unseen test sampes is minimized, so that SVM define good predictive modes. In this chapter we first anayze inear cassifiers, and we introduce the concept of optima separating hyperpane which characterizes SVM modes. By the Wofe s dua theory we show that the training of a inear SVM for cassification eads to sove a convex quadratic programming probem with one inear constraints and box constraints. Then we extend the approach to noninear SVM for cassification. This extension needs the introduction of the so-caed kerne functions. Both inear and noninear SVM training needs the soution of a convex quadratic programming probem. We present optimization methods based on decomposition techniques. The adoption of decomposition techniques is motivated by the fact that, in rea appications, the number of training data may be very huge and the Hessian matrix cannot be stored. We focus on widey used Sequentia Minima Optimization (SMO) agorithms, which updates at each iteration two variabes, and we state theoretica convergence resuts. 1.1 Notation The Training Set (TS) is a set of observations: T S={(x i,y i ), x i X R n, y i Y R,,...,}. The vectors x i are the patterns beonging to the input space. The scaars y i are the abes (targets). In a cassification probem we have that y i { 1,1}, in a regression probem y i R. 1

2 2 1 Decomposition methods for Support Vector Machines 1.2 Linear cassifiers and the optima separating hyperpane Let us consider two disjoint sets A e B of points in R n to be cassified. Assume that A and B are ineary separabe, that is, there exists an hyperpane H = {x R n : w T x+b=0} such that the points x i A beong to an haf space, and the points x j B beong to the other haf space. Then, there exist a vector w R n and a scaar b R such that w T x i + b ε, x i A w T x j + b ε, x j (1.1) B where ε > 0. Dividing by ε we can write w T x i + b 1, x i A w T x j + b 1, x j B (1.2) An hyperpane wi be indicated by H(w,b). We say that H(w,b) is a separating hyperpane if the pair (w,b) is such that (1.2) hods. The decision function of a inear cassifier associated to a separating hyperpane is f(x)=sgn(w T x+b). We introduce the concept of margin of a separating hyperpane (see Fig. 1.1). Definition 1.1. Let H(w,b) be a separating hyperpane. The margin of H(w,b) is the minimum distance ρ between points in A B and the hyperpane H(w,b), that is ρ(w,b)= min x i A B { w T x i + b w }. ρ H(w, b) Fig. 1.1 Margin of a separating hyperpane It is quite intuitive that the margin of a given separating hyperpane is reated to the generaization capabiity of the corresponding inear cassifier. For instance, observing Figure 1.2, we may expect that the hyperpane H(w,b) eads to a better inear cassifier than that associated to the hyperpane H(ŵ, ˆb). The reationship between the margin and the generaization capabiity of inear cassifiers is anayzed by the

3 1.2 Linear cassifiers and the optima separating hyperpane 3 H(ŵ,ˆb) H(w, b) Fig. 1.2 Two separating hyperpanes with different margins statistica earning theory, which theoreticay motivates the importance of defining the hyperpane with maximum margin, the so-caed optima separating hyperpane (see Fig.1.3). H(w,b ) Fig. 1.3 The optima hyperpane Definition 1.2. Given two ineary separabe sets A and B, the optima separating hyperpane is a separating hyperpane H(w,b ) having maximum margin. It can be proved that the optima hyperpane exists and is unique (the forma proof is reported in Appendix A). From the above definition we get that the optima hyperpane is soution of the foowing probem { w T x i } + b max (1.3) w R n,b R w min x i A B The idea underying the proof of existence and uniqueness of the optima hyperpane is based on the foowing steps: - for each separating hyperpane H(w, b), there exists a separating hyperpane H(ŵ, ˆb) such that

4 4 1 Decomposition methods for Support Vector Machines 1 w ρ(w,b) 1 ŵ ; - the above condition impies that probem (1.3) admits soution provided that the foowing probem min w R n,b R 1 w s.t. w T x i + b 1, x i A w T x j + b 1, x j B (1.4) admits soution; - probem (1.4) is obviousy equivaent to min w R n,b R w 2 s.t. w T x i + b 1, x i A w T x j + b 1, x j B, (1.5) and we can prove that it admits a unique soution, which is aso the unique soution of (1.3). 1.3 Linear SVM We present the inear cassifiers defined by SVM in the case both of ineary and of not ineary separabe sets The case of ineary separabe sets Given two ineary separabe sets A and B of points in R n, among the infinite inear cassifiers corresponding to the infinite separating hyperpane, a inear SVM corresponds to a inear cassifier where the decision surface is the optima separating hyperpane. Then, as aready seen, the training of a inear SVM requires to determine the optima separting hyperpane, that is, to sove the probem max ρ(w,b) s.t. w T x i + b 1, x i A w T x j + b 1, x j B, (1.6) where ρ(w,b) is the margin. We have shown in Appendix A that probem (1.6) is equivaent to the foowing convex quadratic programming probem

5 1.3 Linear SVM 5 minf(w) = 1 2 w 2 s.t. w T x i + b 1, x i A w T x j + b 1, x j B (1.7) By using the abe y i =+1 for vectors x i A, and the abe y j = 1 for the vectors x j B, probem (1.8) takes the form minf(w) = 1 2 w 2 s.t. y i[ w T x i + b ] 1 0,,..., (1.8) where is the tota number of training points. We wi consider the Wofe dua formuation (see Appendix B) of probem (1.8) for the foowing reasons: the constraints (1.8) wi be substituted by simpest constraints on the Lagrange mutipiers; in the dua formuation we have inner products between the training vectors and this wi aow us to easiy extend the training procedure to the case of non separabe sets. The Lagrangian of (1.8) is the foowing function L(w,b,λ)= 1 2 w 2 The Wofe dua probem of (1.8) takes the form max L(w,b,λ)= 1 2 w 2 λ i [y i (w T x i + b) 1] (1.9) λ i [y i (w T x i + b) 1] s.t. w L(w,b,λ)=0 L(w,b,λ) b λ 0, = 0 that is

6 6 1 Decomposition methods for Support Vector Machines max L(w,b,λ)= 1 2 w 2 λ i [y i (w T x i + b) 1] s.t. w= λ i y i x i (1.10) λ i y i = 0 λ i 0,..., The maximization probem (1.10) can be written as foows max S(λ)= 1 2 j=1 y i y j (x i ) T x j λ i λ j + λ i s.t. λ i y i = 0 (1.11) λ i 0,..., or, equivaenty, as minimization probem of the form min Γ(λ)= 1 2 j=1 y i y j (x i ) T x j λ i λ j λ i s.t. λ i y i = 0 (1.12) λ i 0,..., We observe that: the existence of the optima soution (w,b ) of (1.8), the inearity of the constraints and Proposition 1.7 ensure that probem (1.11) admits at east a soution λ ; from (1.10) we get that the vector w can be determined as foows w = λ i y i x i ; w depends ony on the so-caed (support vectors) x i whose corresponding mutipiers λi are not nu;

7 1.3 Linear SVM 7 assertion (iii) of Proposition 1.8 ensure that (w,b,λ ) is a pair (optima soution-vector of Lagrange mutipiers), and hence satisfies the foowing compementarity conditions λ i [ y i ( (w ) T x i + b ) 1 ] = 0,..., (1.13) once computed w, by considering any mutipier λi 0, the scaar b can be determined by means of the corresponding compementarity condition defined by (1.13). probem [ (1.12) is a convex quadratic programming probem; indeed, setting X = y 1 x 1,...,y x ], λ T = [ λ 1,...,λ ], the probem takes the form min Γ(λ)= 1 2 λ T X T Xλ e T λ s.t. λ i y i = 0 λ i 0,...,, where e T =[1,...,1]; the decision function is f(x)=sgn ( ) (w ) T x+b ) = sgn( λi y i (x i ) T x+b Optima hyperpane Optima hyperpane and support vectors

8 8 1 Decomposition methods for Support Vector Machines x b x a H(w, b) Fig. 1.4 Non separabe sets: the point x a is a miscassified point with 1 < ξ a, the point x b is a correcty cassified point, but beongs to the separation zone, and hence 0<ξ b < The case of non ineary separabe sets Now assume that the two sets A are B are not ineary separabe. This means that the system of inear inequaities w T x i + b 1, x i A w T x j + b 1, x j B (1.14) does not admit soution. Let is introduce the sack variabes ξ h, with h=1,...,: w T x i + b 1 ξ i, x i A w T x j + b 1+ξ j, x j B ξ h 0, h=1,..., (1.15) Note that whenever a vector x i is not correcty cassified the corresponding variabe ξ i is greater than 1. The variabes ξ i corresponding to vectors correcty cassified and beonging to the separation zone (see Fig. 1.4) are such that 0 < ξ i < 1. Therefore, the term is an upper bound of the number of the cassification errors on the training vectors. Therefore, it is quite natura to add to the objective function of probem (1.8) the rerm C probem becomes ξ i ξ i, where C > 0 is a parameter to assess the training error. The prima

9 1.3 Linear SVM 9 minf(w,ξ) = 1 2 w 2 +C ξ i s.t. y i[ w T x i + b ] 1+ξ i 0,..., (1.16) ξ i 0,..., The dua of (1.16) is max L(w,b,ξ,λ, µ)= 1 2 w 2 +C s.t. w= λ i y i x i λ i y i = 0 ξ i λ i [y i (w T x i + b) 1+ξ i ] µ i ξ i C λ i µ i = 0,..., λ 0 µ 0, which can be equivaenty written in the form min Γ(λ)= 1 2 j=1 y i y j (x i ) T x j λ i λ j λ i t.c. λ i y i = 0 (1.17) 0 λ i C,..., Note that the constraints λ i C, for i = 1,...,, foow from the constraints λ i = C µ i, µ i 0. Again we observe that the vector w is w = λ i y i x i ; for the optima soution(w,b,ξ λ, µ ) the foowing compementarity conditions hod

10 10 1 Decomposition methods for Support Vector Machines φ w T φ(x)+b = 0 Fig. 1.5 Mapping the data from the input space onto the feature space λ i [ y i ( (w ) T x i + b ) 1+ξi ] = 0,..., (1.18) µ i ξ i = 0,..., (1.19) given w and any mutipier λi such that 0<λi < C, the scaar b can be determined by means of the corresponding condition defined by (1.18); whenever λi {0,C} for i = 1,...,, we say that the soution is degenere; probem (1.17) is a convex quadratic programming probem; the decision function of the cassifier is f(x)=sgn ( ) (w ) T x+b ) = sgn( λi y i (x i ) T x+b 1.4 Noninear SVM Linear modes coud be not rich enough to capture noninear patterns in the data. The motivation of introducing noninear SVM is to obtain a noninear decision boundary for probems where the data distributions are inherenty noninear. The idea underying the noninear SVM is that of mapping the data of the input space onto a higher dimensiona space caed feature space and to define a inear cassifier in the feature space (see Fig.1.5). Let us consider a mapping φ : R n H where H is an Eucidean space (the feature space) whose dimension is greater than n (the dimension can be even infinite). The input training vectors x i are mapped onto φ(x i ), with,...,. We can think to define a inear SVM in the feature space by repacing x i with φ(x i ). Then we have

11 1.4 Noninear SVM 11 the dua probem (1.17) is repaced by the foowing probem min Γ(λ)= 1 2 j=1 y i y j φ(x i ) T φ(x j )λ i λ j λ i t.c. λ i y i = 0 (1.20) 0 λ i C,..., the vector w is w = λ i y i φ(x i ) given w and any 0<λ i < C, the scaar b can be determined using the compementarity conditions the decision function takes the form From (1.22) we get that the separation surface is: - inear in the feature space; - non inear in the input space. ( y i λ j y j φ(x j ) T φ(x i )+b ) 1; (1.21) j=1 f(x)=sgn ( (w ) T φ(x)+b ). (1.22) It is important to observe that both in the dua formuation (1.20) and in formua (1.22) concerning the decision function is not necessary to expicity konw the mapping φ, but it is sufficient to know the inner product φ(x) T φ(z) of the feature space. This eads to the fundamenta concept of kerne function. Definition 1.3. Given a set X R n, a function is a kerne if K : X X R K(x,y)= φ(x),φ(y) x,y X, (1.23) where φ is an appication X H and H is an Eucidean space, that is, a inear space with a fixed inner product. We observe that a kerne is necessariy a symmetric function. It can be proved that K(x,z) is a kerne if and ony if the matrix

12 12 1 Decomposition methods for Support Vector Machines ( K(x i,x j ) ) K(x 1,x 1 )... K(x 1,x ) i, j=1 =. K(x,x 1 )... K(x,x ) is semidefinite positive for any set of training vectors {x 1,...,x }. In the iterature the kerne are often denoted by Mercer kerne. Proposition 1.1. Let K : X X R be a symmetric function. Then K is a kerne if and ony if, for any choice of the vectors x 1,...,x in X the matrix is semidefinite positive. K =[K(x i,x j )] i, j=1,..., Using the definition of kerne probem (1.20) can be written as foows min Γ(λ)= 1 2 j=1 y i y j K(x i,x j )λ i λ j λ i s.t. λ i y i = 0 (1.24) 0 λ i C,..., By Proposition 1.1 it foows that probem (1.24) is a convex quadratic programming probem. Exampes of kerne functions are: K(x,z)=(x T z+1) p poynomia kerne (p integer 1) K(x,z)=e x z 2 /2σ 2 gaussian kerne (σ > 0) K(x,z)= tanh(βx T z+γ) hyperboic tangent kerne (suitabe vaues of β and γ) Using the definition of kerne function the decision function is ) f(x)=sgn( λi y i k(x,x i )+b Remark 1.1. On the Gaussian kerne It is possibe to show that the Gaussian kerne is an inner product in an infinite dimensiona space. As consequence, for sufficienty arge vaues of the parameter C the error training is zero.

13 1.5 A genera decomposition scheme for SVM training 13 Remark 1.2. On the poynomia kerne Let us assume that the dimension n of the input space is 2, and et us consider the homogeneous poynomia kerne K(x,z)=(x T z) 2. We show that there are different feature spaces H such that φ : R 2 H and Indeed, both the mapping and the mapping are such that φ(x) T φ(z)=k(x,z). x 2 1 φ(x)= 2x1 x 2 (1.25) x 2 2 x 2 1 φ(x)= x 1 x 2 x 1 x 2 (1.26) x2 2 φ(x) T φ(z)= ( x T z ) 2. Note that in (1.25) we have that φ : R 2 R 3, whie in (1.26) we have that φ : R 2 R 4. It is possibe to show that, for the homogeneous poynomia kerne K(x,z) = ( x T z ) p, the dimension of the minimum embedding space is ( ) n+ p 1 n As consequence, for sufficienty arge vaues of the exponent p and of the parameter C the error training is zero. Finay, we observe that a SVM with gaussian kerne corresponds to a RBF neura network where the number of basis functions and their centres are automaticay determined by the number of support vectors and their vaues. In a simiar way, a SVM with hyperboic tangent kerne corresponds to a MLP neura network, with one hidden ayer, where the number of neuron and the vaues of the weights are automaticay determined by the number of support vectors and their vaues. 1.5 A genera decomposition scheme for SVM training Let us consider the convex quadratic programming probem for SVM training in the case of cassification probems:

14 14 1 Decomposition methods for Support Vector Machines min f(α)= 1 2 αt Qα e T α s.t. y T α = 0 (1.27) 0 α C, where α R, is the number of training data, Q is a symmetric and semidefinite positive matrix, e R is the vector of ones, y { 1,1}, and C is a positive scaar. The generic eement q i j of Q is y i y j K(x i,x j ), where K(x,z)=φ(x) T φ(z) is the kerne function reated to the noninear function φ that maps the data from the input space into the feature space We assume that the number of training data is huge and the Hessian matrix Q, which is dense, cannot be fuy stored so that standard methods for quadratic programming cannot be used. Hence, the adopted strategy to sove the SVM probem is usuay based on the decomposition of the origina probem into a sequence of smaer subprobems obtained by fixing subsets of variabes. In a genera decomposition framework, at each iteration k, the vector of variabes α k is partitioned into two subvectors (αw k,αk ), where the index set W {1,...,} W identifies the variabes of the subprobem to be soved and is caed working set,and W ={1,..., }\ W (for notationa convenience, we omit the dependence on k). Starting from the current soution α k =(αw k,αk ), which is a feasibe point, the W subvector α k+1 W is computed as the soution of the subrobem min α W f(α W,α k W ) y T W α W = y T W αk W 0 α W C. (1.28) The variabes corresponding to W are unchanged, that is, α k+1 = αk, and the cur- W rent soution is updated setting α k+1 =(αw k+1,αk+1 W ). W The cardinaity q of the working set, namey the dimension of the subprobem, must be greatee than or equa to 2, otherise we woud have α k+1 = α k. Indeed, assuming q=1 and W ={i}, if α k is a feasibe point then we have y i α k i = y T W αk W. In order to guarantee that α k+1 is a feasibe point we must have and hence α k+1 = α k. y i α k+1 i = y T W αk W A genera decomposition scheme is described beow.

15 1.6 Sequentia Minima Optimization (SMO) agorithms 15 Decomposition agorithm Data. A feasibe point α 0 (usuay α 0 = 0). Iniziaization. Set k=0. Whie ( the stopping criterion is not satisfied ) 1. seect the working set W k ; 2. set W = W k and compute a soution αw of the probem (1.27); α 3. set αi k+1 i se i W = atrimenti; α k i 4. set f(α k+1 )= f(α k )+Q ( α k+1 α k). 5. set k=k+ 1. end whie Return α = α k The seection rue of the working set strongy affects both the speed of the agorithm and its convergence properties. In computationa terms, the most expensive step at each iteration of a decomposition method is the evauation of the kerne to compute the coumns of the Hessian matrix, corresponding to the indices in the working set W. In the seque we mainy wi focus on agorithms using working sets with cardinaity equa to two, since they are the most used agorithms to sove arge quadratic programs for SVM training. 1.6 Sequentia Minima Optimization (SMO) agorithms The decomposition methods usuay adopted are the so-caed Sequentia Minima Optimization (SMO) agorithms, since they update at each iteration the minimum number of variabes, that is two. At each iteration, a SMO agorithm requires the soution of a convex quadratic programming of two variabes: minq(α i,α j ) = 1 ( ) T αi α j q ii q i j 2 q ji q j j s.t. y i α i + y j α j = y T W αk W ( αi α j ) α i α j (1.29) 0 α h C h=i, j

16 16 1 Decomposition methods for Support Vector Machines We first show that the soution of a subprobem in two variabes of the form (1.29) can be anayticay determined (and this is one of the reasons motivating the interest in defining SMO agorithms). To this aim, given a feasibe point ᾱ and a feasibe direction d, we indicate by β the maximum feasibe step ength aong d starting from ᾱ, i.e., - if d 1 > 0 and d 2 > 0, β = min{c ᾱ 1,C ᾱ 2 }; - if d 1 < 0 and d 2 < 0, β = min{ᾱ 1,ᾱ 2 }; - if d 1 > 0 e d 2 < 0, β = min{c ᾱ 1,ᾱ 2 }; - if d 1 < 0 e d 2 > 0, β = min{ᾱ 1,C ᾱ 2 }. ( ) We set d + 1/y1 = and we report beow the scheme for the anaytica computation of the soution of probem 1/y 2 (1.29). Anaytica computation of the soution of the two-dimensiona probem 1. If f(ᾱ) T d + = 0 set α = ᾱ and stop; 2. If f(ᾱ) T d + < 0 set d = d + ; 3. If f(ᾱ) T d + > 0 set d = d + ; 4. Let β the maximum feasibe step enght aong d : 4a.if β = 0 set α = ᾱ and stop; 4b.if(d ) T Qd = 0 set β = β, otherwise compute β nv = f(ᾱ)t d (d ) T Qd and set β = min{ β,β nv }; 4c.set α = ᾱ+ β d and stop. Now, et us consider the seection rue for choosing, at each iteration k, the two variabes of subprobem (1.29). The seection rue shoud ensure a strict decrease of the objective function. The new point α k+1 is obtained by the updating of two variabes, indicated by α i and α j, so that we have We observe that: ( T α k+1 = α1 k,...,αk+1 i,...,α k+1 j,...,αn) k. (1.30) (i) the new point α k+1 must be feasibe; (ii)if α k is not a soution, then we woud have f(α k+1 )< f(α k ). According to (i), (ii) and (1.30), we focus on feasibe and descent directions (see (i) and (ii)) with ony two nonzero eements (see (1.30)). Therefore, in the seque we wi anayze directions having these features that pay a crucia roe in the decomposition strategies for the considered probem. Let us consider the feasibe set of probem (1.27), indicated by F, namey

17 1.6 Sequentia Minima Optimization (SMO) agorithms 17 F ={x R : y T α = 0, 0 α C}. Given any feasibe point α, we indicate as foows the indices of the active box (ower and upper) constraints L(α)={i : α i = 0}, U(x)={i : α i = C}. The set of feasibe directions in a point α F is the cone D(α)={d R : y T d = 0, d i 0, i L(α), e d i 0, i U(α)}. Indeed, if d is a feasibe direction in α F, then for t sufficienty sma we must have y T (α+td)=0, α+td 0, α+td C, from which it necessariy foows and a T d = 0, d i 0 if α i = 0, d i 0 if α i = C. Given a point ᾱ F, we define feasibe directions in ᾱ with ony two not nu components d i and d j. We indicate by d i, j the direction d i, j =(0,...0, d i, 0,...0, d j, 0,...0). T Since we must have we set y T d i, j = y i d i + y j d j = 0, d i = 1 y i d j = 1 y j. Furthermore, we remark that: - if i L(ᾱ), namey ᾱ i = 0, we must have d i 0, and hence y i > 0; - if i U(ᾱ), namey ᾱ i = C, we must have d i 0, and hence y i < 0; - if j L(ᾱ), namey ᾱ j = 0, we must have d j 0, and hence y j < 0; - if j U(ᾱ), namey ᾱ j = C, we must have d j e0, and hence y j > 0. Note that, whenever 0 < ᾱ i < C, there are not constraints on the sign of d i (and hence on y i ). In the same way, whenever 0 < ᾱ j < C, there are not constraints on the sign of d j (and hence on y j ). On the basis of the preceding considerations we partition the sets L and U into the subsets L, L +, and U, U + respectivey, where

18 18 1 Decomposition methods for Support Vector Machines We observe that if: L + (α)={i L(α) : y i > 0}, L (α)={i L(α) : y i < 0} U + (α)={i U(α) : y i > 0}, U (α)={i U(α) : y i < 0}. - i beongs to L + or to U, and - j beongs to L or to U +, then the corresponding direction d i, j is a feasibe direction in ᾱ. In order to characterize feasibe directions (with ony two nonzero components) in ᾱ, et us define the foowing index sets R(ᾱ)=L + (ᾱ) U (α) {i : 0<ᾱ i < C} S(ᾱ)=L (ᾱ) U + (ᾱ) {i : 0<ᾱ i < C}. (1.31) Note that R(ᾱ) S(ᾱ)={i : 0<ᾱ i < C} R(ᾱ) S(ᾱ)={1,...,}. Moreover, it is easy to see that both R(ᾱ) and S(ᾱ) are non empty. The two index sets R and S aow us to define a the feasibe and descent directions with ony two nonzero components. This is shown in the next proposition. Proposition 1.2. Let ᾱ a feasibe point and et (i, j) {1,...,}, i j, a pair of indices. Then the direction d i, j R such that 1/y i se h=i d i, j h = 1/y j se h= j 0 atrimenti (i) is a feasibe direction in the point ᾱ if and ony if i R(ᾱ) and j S(ᾱ); (ii)the direction d i, j is a descent direction for f in ᾱ if and ony if ( f(ᾱ)) i y i ( f(ᾱ)) j y j < 0. (1.32) Proof. (ia) Assume that d i, j is a feasibe direction. We wi show that i R(ᾱ) and j S(ᾱ). By contradiction assume that i R(ᾱ) and j / S(ᾱ), that is j L + (ᾱ) U (ᾱ). If j L + (ᾱ) then ᾱ j = 0 and y j = 1, so that d j < 0 and hence d i, j is not a feasibe direction since ᾱ j +td j < 0 t > 0. In the same way, if j U (ᾱ) then ᾱ j = C and y j = 1, so that d j > 0, and hence d i, j is not a feasibe direction. (ib) Assume that i R(ᾱ) and j S(ᾱ). We must show that d i, j is such that

19 1.6 Sequentia Minima Optimization (SMO) agorithms 19 y T d i, j = 0 e d i, j h 0 h L(ᾱ) and di, j h 0 h U(ᾱ). From the definition of d i, j it foows y T d = y i d i, j i + y j d i, j j = 0. Moreover, we have i R(ᾱ), and hence, if i L(ᾱ), then (1.31) impies i L + (ᾱ), that is d i = 1/y i > 0. Anaogousy, we have j S(ᾱ), and hence, if j U(ᾱ), then j U + (ᾱ), that is d j = 1/y j < 0. The same concusions can be drawn in the case that i U(ᾱ) and j L(ᾱ), and hence we can concude that d i, j is a feasibe direction. (ii) Since f is a convex and continuousy differentiabe function, the condition f(ᾱ) T d i, j = ( f(ᾱ)) i y i ( f(ᾱ)) j y j < 0 is necessary and sufficient to ensure that d i, j is a descent direction for f in ᾱ. If the pair of indices (i, j) defines a feasibe and descent direction d i, j in the current point α k, then the minimization with respect to the pair of variabes α i and α j wi produce a strict decrease of the objective function. Then, taking into account Proposition 1.2 we define the forma scheme of a SMO agorithm.

20 20 1 Decomposition methods for Support Vector Machines SMO Agorithm Data. The starting point α 0 = 0 and the gradient f(α 0 )=e. Iniziaization. Set k=0. Whie ( the stopping criterion is not satisfied ) 1. seect i R(α k ), j S(α k ), such that and set W ={i, j}; f(α k ) T d i, j < 0, 2. compute the anaytica soution α = αi for h=i 3. set αh k+1 = α j for h= j otherwise; α k h ( α i α j) T of (1.29); 4. set f(α k+1 )= f(α k )+(α k+1 i α k i )Q i+(α k+1 j α k j )Q j; 5. set k=k+ 1. end whie Return α = α k We observe that the above scheme generates a sequence {α k } such that f(α k+1 )< f(α k ). (1.33) The scheme requires to store a vector o size n (the gradient f(α k )) and to get two coumns, Q i e Q j, of the matrix Q. The choice α 0 = 0 for the starting point is motovated by the fact that this point is a feasibe point and such that the computation of the gradient f(α 0 ) does not require any eement of the matrix Q, being f(0)= e. We remark that condition (1.33) is not sufficient to guarantee the convergence towards soutions of the probem. In order to guarantee goba convergence properties of the generated sequence, suitabe working set seection rues must be adopted. For instance, Gauss-Soutwe rues based on the vioation of the optimaity conditions anayzed in the nex section.

21 1.7 Convergent SMO agorithms using first order information Convergent SMO agorithms using first order information As aready viewed, SMO-type agorithms seect at each iteration working set of size exacty two, so that the updated point can be anayticay computed, and this eiminates the need to use an optimization software. In order to design convergent SMO agorithms we need to state the optimaity conditions in a form usefu for defining suitabe working set seection rues. For sake of generaity, in the seque we wi assume that f is a convex continuousy differentiabe function. Let F be the feasibe set of probem (1.27), that is F ={α R : y T α = 0, 0 α C}. Given a feasibe point α, we have aready introduced the the indices of the active box (ower and upper) constraints and the index sets where L(α)={i : α i = 0}, U(x)={i : α i = C}, R(α)=L + (α) U (α) {i : 0<α i < C} S(α)=L (α) U + (α) {i : 0<α i < C}. L + (α)={i L(α) : y i > 0}, L (α)={i L(α) : y i < 0} U + (α)={i U(α) : y i > 0}, U (α)={i U(α) : y i < 0}. The introduction of the index sets R(α) and S(α) aows us to state the optimaity conditions in the foowing form (the proof of the proposition can be found in Appendix C). Proposition 1.3. A feasibe point α is a soution of (1.27) if and ony id { max ( } { f(α )) i min ( } f(α )) j. (1.34) i R(α ) y i j S(α ) y j Given a feasibe point ᾱ, which is not a soution of probem (1.27), a pair i R(ᾱ), j S(ᾱ) such that { ( } { f(α )) i > ( } f(α )) j y i y j is said to be a vioating pair. It can be shown (see Proposition 1.2) that SMO-type agorithms have strict decrease of the objective function if and ony if the working set is a vioating pair.

22 22 1 Decomposition methods for Support Vector Machines However, as aready said, the use of generic vioating pairs as working sets is not sufficient to guarantee convergence properties of the sequence generated by a decomposition agorithm. A convergent SMO agorithm can be defined using as indices of the working set those corresponding to the maxima vioation of the KKT conditions. More specificay, given again a feasibe point α which is not a soution of probem (1.27), et us define I(α)= { i : i arg max i R(α) { J(α)= j : j arg min j S(α) { ( f(α)) i y i { ( f(α)) j y j Taking into account the KKT conditions as stated in (1.34), a pair i I(α), j J(α) most vioates the optimaity conditions, and therefore, it is said to be a maxima vioating pair. Note that the seection of the maxima vioating pair invoves O() operations. A SMO-type agorithm using maxima vioating pairs as working sets is usuay caed most vioating pair (MVP) agorithm which is formay described beow. }} }} Most Vioating Pair (MVP) Agorithm Data. The starting point α 0 = 0 and the gradient f(α 0 )=e. Iniziaization. Set k=0. Whie ( the stopping criterion is not satisfied ) 1. seect i I(α k ), j J(α k ), and set W ={i, j}; ( T 2. compute the anaytica soution α = αi α j) of (1.29); αi for h=i 3. set αh k+1 = α j for h= j otherwise; α k h 4. set f(α k+1 )= f(α k )+(α k+1 i α k i )Q i+(α k+1 j α k j )Q j; 5. set k=k+ 1. end whie Return α = α k We can state the foowing convergence resut.

23 1.8 Convergent SMO agorithms using second order information 23 Proposition 1.4. Suppose that the symmetrix matrix Q is semidefinite positive, and et {α k } be the sequence generated by MVP Agorithm. Then, {α k } admits imit points, and each imit point is a soution of probem (1.27). A usua requirement to estabish convergence properties in the context of a decomposition strategy is that ( im α k+1 α k) = 0. (1.35) k Indeed, in a decomposition method, at the end of each iteration k, ony the satisfaction of the optimaity conditions with respect to the variabes associated to W k is ensured. Therefore, to get convergence towards KKT points, it may be necessary to ensure that consecutive points, which are soutions of the corresponding subprobems, tend to the same imit point. It can be proved that SMO agorithms guarantee property (1.35) (the proof fuy expoits that the subprobems are convex, quadratic probems into two variabes). The convergence resut of Proposition 1.4 can be obtained even using working set rues different from that seecting the maxima vioating pair. For instance, the socaed constant-factor vioating pair rue guarantees goba convergence properties of the SMO agorithm adopting it, and requires to seect any vioating pair u R(α k ), v S(α k ) such that ( f(α k )) u y u ( f(αk )) v y v where 0<σ 1 and (i, j) is a maxima vioating pair. ( ( f(α k )) i σ ( ) f(αk )) j, (1.36) y i y j 1.8 Convergent SMO agorithms using second order information It can be proved that the direction d i, j, being(i, j) the maxima vioating pair, is the soution of the foowing probem min d f(α k ) T d y T d = 0 d t 0 if α k t = 0 d t 0 if α k t = C (1.37) 1 d t 1 {h : d h 0} =2

24 24 1 Decomposition methods for Support Vector Machines The inequaities 1 d t 1 avoid that the objective function tends to. Then, the maxima vioating pair is reated to the minimization of the first order approximation: f(α k + d) f(α k )+ f(α k ) T d. As f in quadratic we can write f(α k + d)= f(α k )+ f(α k ) T d+ 1 2 dt Qd, and we can consider, instead of probem (1.37), the foowing probem min d q(d)= f(α k ) T d+ 1 2 dt Qd y T d = 0 d t 0 if α k t = 0 (1.38) d t 0 if α k t = C {h : d h 0} =2 Note that the constraints 1 d t 1 have been removed since, under suitabe assumptions on Q, the objective function is bounded beow. In particuar we assume that Q is positive definite. The soution of (1.38) woud require O( 2 ) operations, and this coud be impractica whenever the number of training data is huge. In order to take into account this issue, a suitabe working set seection rue, using second order information, has been designed and requires O() operations. More specificay, an index is seected as in the maxima vioating pair, say i is such that i I(α k )=arg max { h f(α k ) }, (1.39) h R(α k ) y h and the other index j as the soution of a minmin probem, i.e., j argmin t S(α k ):y i i f(α k ) y t t f(α k )<0 min di,d t f(α k ) i d i + f(α k ) t d t (d i d t ) T ( qii q it q it q tt )( di d t ) (1.40) y i d i + y t d t = 0 Note that the constraint t S(α k ) : y i i f(α k ) y t t f(α k ) < 0 impies that the seected pair(i, j), with j soution of (1.40), is a vioating pair. We have omitted the constraints

25 1.8 Convergent SMO agorithms using second order information 25 d t 0 if α k t = 0 d t 0 if α k t = C, (1.41) since, as shown ater, they are satisfied at the optima soution. Given t S(α k ) such that y i i f(α k ) y t t < 0, et us consider the inner probem ( )( ) min di,d t f(α k ) i d i + f(α k ) i d j (d i d j ) T qii q it di q it q tt d t (1.42) y i d i + y t d t = 0 From the constraint y i d i + y t d t = 0 we get d i = yt y i d t. By substitution and simpe cacuations, recaing that q ii = K ii, q it = y i y t K it,, we obtain the equivaent probem 1 ) min d t 2 (K ii+ K tt 2K it )yt 2 dt 2 + ( f(α k ) i y i + f(α k ) t y t y t d t, (1.43) The assumption that Q is positive definite impies K ii + K j j 2K i j > 0, and hence that subprobem (1.43) admits soution. In particuar, the soution is such that where y t d t = y i d i = b it a it < 0, (1.44) a it =(K ii + K j j 2K i j )>0 b it = y i i f(α k )+y t t f(α k )>0. Note that the constraints (1.41) are satisfied. Indeed, suppose, for instance, αi k = C and αt k = 0. Since i R(α k ) and t S(α k ), by definition we have y i = 1 and y t = 1. From (1.44) it foows d i < 0 and d t > 0. The other cases are simiar. The optima vaue of the probem (1.42) is b2 it 2a it. Therefore, the index j corresponding to the soution of the probem (1.40) can be computed as foows { } j argmin b2 it : t S(α k ), y i i f(α k ) y t t f(α k )<0, (1.45) t a it It can be proved that the pair (i, j), with i satisfying (1.39) and j satisfying (1.45), is a constant-factor vioating pair. Indeed, et (i, j ) be a maxima vioating pair. As j S(α k ), we can write ( f(α k ) i y i + f(α k ) 2 ) j y j from which it foows a i j ( ) 2 f(α k ) i y i + f(α k ) j y j a i j,

26 26 1 Decomposition methods for Support Vector Machines ( f(α k ) i y i f(α k ai ) j y j j Then, condition (1.36) hods with a i j ) 1/2 ( f(α k ) i y i f(α k ) j y j ) ) ( mins,t a st max s,t a st ) 1/2 ( f(α k ) i y i f(α k ) j y j ). ( ) mins,t a 1/2 st σ =. max s,t a st Therefore, a SMO agorithm using the second order working set rue defined by (1.39) and (1.45) is a convergent agorithm. We remark that the seection of the above pair (i, j) requires O() operations. 1.9 On the stopping criterion Let us introduce the functions m(α), M(α) : F R: max ( f(α)) h if R(α) /0 h R(α) y m(α)= h atrimenti min ( f(α)) h if S(α) /0 h S(α) y M(α)= h + atrimenti where R(α) and S(α) are the index sets previousy defined. From the definitions of m(α) and M(α), and using Proposition 1.3, it foows that ᾱ is soution of (1.27) if and ony if m(ᾱ) M(ᾱ). Let us consider a sequence of feasibe points{α k } convergent to a soution ᾱ. At each iteration k, if α k is not a soution then (using again Proposition 1.3) we have m(α k )>M(α k ). Therefore, one of the adopted stopping criterion is m(α k ) M(α k )+ε, (1.46) where ε > 0. Note that the functions m(α) and M(α) are not continuous. Indeed, even assuming α k ᾱ for k, it may happen that R(α k ) R(ᾱ) or S(α k ) S(ᾱ) for k sufficienty arge. As consequence, in genera we can not write im k m(αk )=m(ᾱ)

27 1.10 Appendix: Proof of existence and uniqueness of the optima hyperpane 27 or im k M(αk )=M(ᾱ). It can be proved that a SMO Agorithm, using the constant-factor vioating pair rue, generates a sequence {α k } such that m(α k ) M(α k ) 0 for k. Hence, for any ε > 0, a SMO agorithm of this type satisfies the stopping criterion (1.46) in a finite number of iterations Appendix: Proof of existence and uniqueness of the optima hyperpane In this appendix we formay prove that the optima hyperpane exists and is unique. To this aim we need some preiminary resuts. Lemma 1.1. Let H(ŵ, ˆb) be a separating hyperpane. Then ρ(ŵ, ˆb) 1 ŵ. Proof. Since it foows ŵ T x + ˆb 1, x A B, ρ(ŵ, ˆb)= min x A B { } ŵ T x + ˆb 1 ŵ ŵ. Lemma 1.2. Given any separating hyperpane H(ŵ, ˆb), there exists a separating hyperpane H( w, b) such that ρ(ŵ, ˆb) ρ( w, b)= 1 w. (1.47) Moreover there exist two points x + A and x B such that w T x + + b=1 w T x + b= 1 (1.48) Proof. Let ˆx i A and ˆx j B the cosest points to H(ŵ, ˆb), that is, the two points such that from which it foows dˆ i = ŵt ˆx i + ˆb ŵ dˆ j = ŵt ˆx j + ˆb ŵ ŵt x i + ˆb, x i A ŵ ŵt x j + ˆb, x j B ŵ (1.49)

28 28 1 Decomposition methods for Support Vector Machines ρ(ŵ, ˆb)=min{ dˆ i, dˆ j } 1 2 ( dˆ i + dˆ j )= ŵt ( ˆx i ˆx j ). (1.50) 2 ŵ Let us consider the numbers α e β such that αŵ T ˆx i + β = 1 αŵ T ˆx j + β = 1 (1.51) that is, the numbers α = 2 = ŵt ŵ T ( ˆx i ˆx j ), β ( ˆx i + ˆx j ) ŵ T ( ˆx i ˆx j ). It can be easiy verified that 0<α 1. We wi show that the hyperpane H( w, b) H(αŵ,β) is a separating hyperpane for the sets A and B, and it is such that (1.47) hods. Indeed, using (1.49), we have As α > 0, we can write ŵ T x i ŵ T ˆx i, x i A ŵ T x j ŵ T ˆx j, x j B. αŵ T x i + β αŵ T ˆx i + β = 1, x i A αŵ T x j + β αŵ T ˆx j + β = 1, x j B (1.52) from which we get that w and b satisifies (1.2), and hence, that H( w, b) is a separating hyperpane for the sets A and B. Furthermore, taking into account (1.52) and the vaue of α, we have { w ρ( w, b)= T x } + b min = 1 x A B w w = 1 α ŵ = ŵt ( ˆx i ˆx j ). 2 ŵ Condition (1.47) foows from the above equaity and (1.50). Using (1.51) we obtain that (1.48) hods with x + = ˆx i and x = ˆx j. Proposition 1.5. The foowing probem min w 2 t.c. w T x i + b 1, x i A w T x j + b 1, x j B (1.53) admits a unique soution(w,b ). Proof. Let F the feasibe set, that is, F ={(w,b) R n R : w T x i + b 1, x i A, w T x j + b 1, x j B}. Given any (w o,b o ) F, et us consider the eve set

29 1.10 Appendix: Proof of existence and uniqueness of the optima hyperpane 29 L o ={(w,b) F : w 2 w o 2 }. The set L o is cose, and we wi show that is aso bounded. To this aim, assume by contradiction that there exists an unbounded sequence {(w k,b k )} beonging to L o. Since w k w o, k, we must have b k. For any k we can write w T k xi + b k 1, x i A w T k x j + b k 1, x j B and hence, as b k, for k sufficienty arge, we have w k 2 > w o 2, and this contradicts the fact that {(w k,b k )} beongs to L o. Then L o is a compact set. The Weirstrass s theorem impies that the function w 2 admits a minimum (w,b ) on L o, and hence, on F. As consequence, (w,b ) is a soution of (1.53). In order to prove that(w,b ) is the unique soution, by contradiction assume that there exists a pair ( w, b) F, ( w, b) (w,b ), such that w 2 = w 2. Suppose w w. The set F is convex, so that λ(w,b )+(1 λ)( w, b) F, λ [0,1]. Since w 2 is a stricty convex function, for any λ (0,1) it foows λw +(1 λ) w 2 < λ w 2 +(1 λ) w 2. ( 1 Getting λ = 1/2, which corresponds to consider the pair( w, b) 2 w w, 1 2 b + 1 ) 2 b, we have ( w, b) F and w 2 < 1 2 w w 2 = w 2, and this contradicts the fact that (w,b ) is a goba minimum. Therefore, we must have w w. Assume b > b (the case b < b is anaogous), and consider the point ˆx i A such that w T ˆx i + b = 1 (the existence of such a point foows from (1.48) of Lemma 1.2). We have 1=w T ˆx i + b = w T ˆx i + b > w T ˆx i + b and this contradicts the fact that w T x i + b 1, x i A. As consequence, we must have b b, and hence the uniqueness of the soution is proved. Proposition 1.6. Let (w,b ) be the soution of (1.53). Then, (w,b ) is the unique soution of the foowing probem

30 30 1 Decomposition methods for Support Vector Machines max ρ(w, b) t.c. w T x i + b 1, x i A w T x j + b 1, x j B, (1.54) and hence is the optima hyperpane. Proof. We observe that (w,b ) is the unique soution of the probem max 1 w t.c. w T x i + b 1, x i A w T x j + b 1, x j B. Lemma 1.1 and Lemma 1.2 impy that, for any separating hyperpane H(w,b), we have 1 w ρ(w,b) 1 w and hence, for the separating hyperpane H(w,b ) we obtain ρ(w,b ) = 1 w, which impies that H(w,b ) is the optima separating hyperpane Appendix B: the Wofe dua The idea underying the duaity theory is that of defining, in correspondence to a given minimum probem, caed prima probem: min f(x) x S a maximum probem, caed dua probem max u U ψ(u) in such a way that (at east) the foowing weak dua property hods inf f(x) sup ψ(u). x S Whenever the above property hods, it is possibe to get usefu information on the soutions of the prima probem by means of the anaysis of the the dua probem. For some casses of probems it is possibe to state, under suitabe assumptions, the foowing strong dua property Let us consider the probem u U inf f(x)=sup ψ(u). x S u U

31 1.11 Appendix B: the Wofe dua 31 min f(x) g i (x) 0,,...,m (1.55) c T j x d j = 0, j= 1,..., p where f : R n R, g i : R n R, i = 1,...,m are convex, continuasy differentiabe functions. Let be the Lagrangian. L(x,λ, µ)= f(x)+ m λ i g i (x)+ p j=1 µ j (c T j x d j ) Proposition 1.7. Assume that probem (1.55) admits soution x, and that there exists a pair of Lagrange mutipiers (λ, µ ). Then (x,λ, µ ) is soution of the foowing probem max L(x,λ, µ) x,λ, µ x L(x,λ, µ)=0 λ 0. Moreover, the duaity gap is nu, that is f(x )=L(x,λ, µ ). Proof. The KKT conditions impy (1.56) x L(x,λ, µ )=0, (1.57) (λ ) T g(x )=0, λ 0. The point (x,λ, µ ) is a feasibe point of probem (1.56), and we have f(x ) = L(x,λ, µ ). We show that (x,λ, µ ) is soution of probem (1.56). Let(x,λ, µ) be a feasibe point of probem (1.56), that is, we have x L(x,λ, µ)= 0 and λ 0. For each λ 0 and for each µ, the function of x L(x, λ, µ)= f(x)+ m λ i g i (x)+ p µ j (c T j x d j ) j=1 is a convex function. Indeed, f is a convex function, and the second term is a inear combination, with nonnegative coefficients, of convex functions, so that, it is a convex function; the third term is an affine function. Therefore, L(x, λ, µ) is the sum of convex functions and hence is convex. From (1.57), recaing that g(x ) 0 and λ 0, taking into account that L, as function of x, is convex (so that L(y) L(x)+(y x) T x L(x)), and using the con-

32 32 1 Decomposition methods for Support Vector Machines dition x L=0, for each λ 0 and for each µ, we can write L(x,λ, µ )= f(x ) f(x )+ m λ ig i (x )=L(x,λ, µ) L(x,λ, µ)+(x x) T x L(x,λ, µ)=l(x,λ, µ) which proves the thesis. Probem (1.56) is usuay referred as Wofe dua. We observe that, in the genera case, given a soution( x, λ, µ) of the Wofe dua, we can not state that x is a soution of the prima probem and that ( λ, µ) is a pair of Lagrange mutipiers Wofe dua for quadratic programming Let us consider the foowing quadratic programming probem min f(x)= 1 2 xt Qx+c T x Ax b 0, (1.58) where Q R n n is symmetric, A R m n, c R n, b R m. Letting L(x,λ)= f(x)+ λ T (Ax b), the Wofe dua is defined as foows max L(x,λ) x,λ x L(x,λ)=0 (1.59) λ 0. We can state the foowing resut. Proposition 1.8. Assume that Q is a symmetric, semidefinite positive matrix. Let ( x, λ) be a soution of the Wofe dua (1.59). Then there exists a vector x (not necessariy equa to x) such that (i) Q(x x)=0; (ii)x is a soution of probem (1.58); (iii)(x, λ) is a pair (goba minimum - vector of Lagrange mutipiers). Proof. Let us consider the dua probem (1.59):

33 1.11 Appendix B: the Wofe dua 33 1 max x,λ 2 x T Qx+c T x+λ T (Ax b) Qx+A T λ + c=0 λ 0 The constraint Qx+A T λ + c=0 impies x T Qx+c T x+λ T Ax=0. (1.60) Using (1.60), the dua probem can be rewritten in the form 1 min x,λ 2 x T Qx+λ T b Qx+A T λ + c=0 (1.61) λ 0 Let( x, λ) be a soution of (1.61). Consider the Lagrangian reated to probem (1.61) W(x,λ,v,z)= 1 2 xt Qx+λ T b v T (Qx+A T λ + c) z T λ. Since ( x, λ) is a soution of the Wofe dua (1.59), from the KKT conditions we get that there exist vectors v R n and z R r such that x W = Q x Q v=0 λ W = b A v z=0 Q x+a T λ + c=0 z T λ = 0 (1.62) z 0 λ 0 From the second and fifty conditions we have z=b A v 0, and hence, the above conditions can be rewritten as foows

34 34 1 Decomposition methods for Support Vector Machines Q x Q v=0 b+a v 0 Q x+a T λ + c=0 (1.63) λ T b+ λ T A v=0 λ 0 By subtracting the first condition from the third condition we obtain Q v+a T λ + c=0. (1.64) As the matrix Q is semidefinite positive, function f is a convex function. From the optimaity conditions in the case of inear constraints it foows that the conditions A v b 0 Q v+a T λ + c=0 λ T (A v b)=0 λ 0 are sufficient to ensure tha v is a soution of probem (1.58). Therefore, by definition, ( v, λ) is a pair (goba minimum - vector of Lagrange mutipiers). Letting x = v we have that x is a soution of probem (1.58), moreover, using the first condition of (1.63), we obtain Hence, assertions (i)-(iii) are proved. Qx = Q v=q x Appendix C: Optimaity conditions We prove here Proposition 1.3. To this aim we state some resuts based on the manipuation of the KKT conditions. Since f is convex and the constraints are inear, a feasibe point α is a soution of the probem (1.27) if and ony if the KKT conditions hod. Let us introduce the Lagrangian L(α,λ,ξ, ˆξ)= 1 2 αt Qα e T α λy T α ξ T α+ ˆξ T (α C),

35 1.12 Appendix C: Optimaity conditions 35 where α R, λ R, ξ, ˆξ R. Proposition 1.9. A feasibe point α is a soution of probem (1.27) if and ony if there exists a scaar λ such that 0 if i L(α ) ( f(α )) i + λ y i 0 if i U(α ) (1.65) = 0 if i / L(α ) U(α ). Proof. Since f is convex and the constraints are inear, a feasibe point α is a soution of probem (1.27) if and ony if there exist Lagrange mutipiers λ R, ξ, ˆξ R such that f(α )+λ y ξ + ˆξ = 0 (1.66) (ξ ) T α = 0 (1.67) ( ˆξ ) T (α C)=0 (1.68) ξ, ˆξ 0. (1.69) We wi show that conditions (1.66)-(1.69) are satisfied if and ony if (1.65) hods. (a) Suppose that α is a feasibe point and that (1.66)-(1.69) hod. Let αi = 0, so that i L(α ). The compementarity condition (1.68) impies ˆξ i = 0, and hence, using (1.66) and (1.69), we obtain ( f(α )) i + λ y i = ξ i 0. In the same way, et αi = C, so that i U(α ). The compementarity condition (1.67) impies ξi = 0, and hence, using (1.66) and (1.69), we obtain ( f(α )) i + λ y i = ˆξ i 0. Finay, assume 0<αi < C, so that i / L(α ) U(α ). The compementarity conditions (1.67) and (1.68) impy ξi, ˆξ i = 0, and hence, using (1.66), we obtain ( f(α )) i + λ y i = 0. (b) Suppose that α is a feasibe point and that (1.65) hods. For,...,: - if α i = 0 then set ˆξ i - if α i = C then set ξ i - if 0<α i < C then set ξ i = 0 and = 0 and ξ i =( f(α )) i + λ y i ; ˆξ i = [( f(α )) i + λ y i ]; = 0 and ˆξ i = 0.

36 36 1 Decomposition methods for Support Vector Machines It can be easiy verified that conditions (1.66)-(1.69) are satisfied. From Proposition 1.9 it foows the next resut. Proposition A point α F is a soution of probem (1.27) if and ony if there exists a scaar λ such that λ ( f(α )) i y i i L + (α ) U (α ) λ ( f(α )) i y i i L (α ) U + (α ) λ = ( f(α )) i y i i / L(α ) U(α ). (1.70) We are ready to prove Proposition 1.3. Proof of Proposition 1.3 (a) Assume that the feasibe point α is a soution. Proposition 1.10 impies that there exists a mutipier λ such that the pair (α,λ ) satisfies conditions (1.70). These atter can be rewritten as foows { max ( } f(α )) i i L + (α ) U (α ) y i λ min i L (α ) U + (α ) λ = ( f(α )) i y i i / L(α ) U(α ). From the definition of the sets R(α ) and S(α ) we obtain { max ( } f(α )) i min i R(α ) y i j S(α ) and hence (1.34) is verified. (b) Assume that (1.34) hods. We can define a mutipier λ such that max i R(α ) { ( f(α )) i y i } λ min i S(α ) { ( f(α )) j y j { ( } f(α )) i y i }, { ( } f(α )) i. (1.71) y i Then, the inequaities of (1.70) hod. The definition of R(α ) and S(α ), and the choice of the mutipier λ (which aows us to satisfy (1.71)) impy { max ( } { f(α )) i λ min ( } f(α )) i. {i: 0<α i <C} y i {i: 0<α i <C} y i Therefore, the equaities of (1.70) hod and the thesis is proved.

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set BASIC NOTIONS AND RESULTS IN TOPOLOGY 1. Metric spaces A metric on a set X is a map d : X X R + with the properties: d(x, y) 0 and d(x, y) = 0 x = y, d(x, y) = d(y, x), d(x, y) d(x, z) + d(z, y), for a

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ).

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ). Bourgain s Theorem Computationa and Metric Geometry Instructor: Yury Makarychev 1 Notation Given a metric space (X, d) and S X, the distance from x X to S equas d(x, S) = inf d(x, s). s S The distance

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Lecture 11. Fourier transform

Lecture 11. Fourier transform Lecture. Fourier transform Definition and main resuts Let f L 2 (R). The Fourier transform of a function f is a function f(α) = f(x)t iαx dx () The normaized Fourier transform of f is a function R ˆf =

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Course 2BA1, Section 11: Periodic Functions and Fourier Series

Course 2BA1, Section 11: Periodic Functions and Fourier Series Course BA, 8 9 Section : Periodic Functions and Fourier Series David R. Wikins Copyright c David R. Wikins 9 Contents Periodic Functions and Fourier Series 74. Fourier Series of Even and Odd Functions...........

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Semidefinite relaxation and Branch-and-Bound Algorithm for LPECs

Semidefinite relaxation and Branch-and-Bound Algorithm for LPECs Semidefinite reaxation and Branch-and-Bound Agorithm for LPECs Marcia H. C. Fampa Universidade Federa do Rio de Janeiro Instituto de Matemática e COPPE. Caixa Posta 68530 Rio de Janeiro RJ 21941-590 Brasi

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

Lecture 6: Moderately Large Deflection Theory of Beams

Lecture 6: Moderately Large Deflection Theory of Beams Structura Mechanics 2.8 Lecture 6 Semester Yr Lecture 6: Moderatey Large Defection Theory of Beams 6.1 Genera Formuation Compare to the cassica theory of beams with infinitesima deformation, the moderatey

More information

14 Separation of Variables Method

14 Separation of Variables Method 14 Separation of Variabes Method Consider, for exampe, the Dirichet probem u t = Du xx < x u(x, ) = f(x) < x < u(, t) = = u(, t) t > Let u(x, t) = T (t)φ(x); now substitute into the equation: dt

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

MA 201: Partial Differential Equations Lecture - 10

MA 201: Partial Differential Equations Lecture - 10 MA 201: Partia Differentia Equations Lecture - 10 Separation of Variabes, One dimensiona Wave Equation Initia Boundary Vaue Probem (IBVP) Reca: A physica probem governed by a PDE may contain both boundary

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

SEMINAR 2. PENDULUMS. V = mgl cos θ. (2) L = T V = 1 2 ml2 θ2 + mgl cos θ, (3) d dt ml2 θ2 + mgl sin θ = 0, (4) θ + g l

SEMINAR 2. PENDULUMS. V = mgl cos θ. (2) L = T V = 1 2 ml2 θ2 + mgl cos θ, (3) d dt ml2 θ2 + mgl sin θ = 0, (4) θ + g l Probem 7. Simpe Penduum SEMINAR. PENDULUMS A simpe penduum means a mass m suspended by a string weightess rigid rod of ength so that it can swing in a pane. The y-axis is directed down, x-axis is directed

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Lecture Notes 4: Fourier Series and PDE s

Lecture Notes 4: Fourier Series and PDE s Lecture Notes 4: Fourier Series and PDE s 1. Periodic Functions A function fx defined on R is caed a periodic function if there exists a number T > such that fx + T = fx, x R. 1.1 The smaest number T for

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Available online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)

Available online at   ScienceDirect. IFAC PapersOnLine 50-1 (2017) Avaiabe onine at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 50-1 (2017 3412 3417 Stabiization of discrete-time switched inear systems: Lyapunov-Metzer inequaities versus S-procedure characterizations

More information

Approximate Bandwidth Allocation for Fixed-Priority-Scheduled Periodic Resources (WSU-CS Technical Report Version)

Approximate Bandwidth Allocation for Fixed-Priority-Scheduled Periodic Resources (WSU-CS Technical Report Version) Approximate Bandwidth Aocation for Fixed-Priority-Schedued Periodic Resources WSU-CS Technica Report Version) Farhana Dewan Nathan Fisher Abstract Recent research in compositiona rea-time systems has focused

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Indirect Optimal Control of Dynamical Systems

Indirect Optimal Control of Dynamical Systems Computationa Mathematics and Mathematica Physics, Vo. 44, No. 3, 24, pp. 48 439. Transated from Zhurna Vychisite noi Matematiki i Matematicheskoi Fiziki, Vo. 44, No. 3, 24, pp. 444 466. Origina Russian

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Restricted weak type on maximal linear and multilinear integral maps.

Restricted weak type on maximal linear and multilinear integral maps. Restricted weak type on maxima inear and mutiinear integra maps. Oscar Basco Abstract It is shown that mutiinear operators of the form T (f 1,..., f k )(x) = R K(x, y n 1,..., y k )f 1 (y 1 )...f k (y

More information

Nonlinear Optimization and Support Vector Machines

Nonlinear Optimization and Support Vector Machines Noname manuscript No. (will be inserted by the editor) Nonlinear Optimization and Support Vector Machines Veronica Piccialli Marco Sciandrone Received: date / Accepted: date Abstract Support Vector Machine

More information

#A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG

#A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG #A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG Guixin Deng Schoo of Mathematica Sciences, Guangxi Teachers Education University, Nanning, P.R.China dengguixin@ive.com Pingzhi Yuan

More information

DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY

DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY SIAM J. OPTIM. Vo. 7, No. 3, pp. 1637 1665 c 017 Society for Industria and Appied Mathematics DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY MIJU AHN, JONG-SHI PANG,

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch The Eighth Internationa Symposium on Operations Research and Its Appications (ISORA 09) Zhangiaie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 402 408 Minimizing Tota Weighted Competion

More information

Tight Approximation Algorithms for Maximum Separable Assignment Problems

Tight Approximation Algorithms for Maximum Separable Assignment Problems MATHEMATICS OF OPERATIONS RESEARCH Vo. 36, No. 3, August 011, pp. 416 431 issn 0364-765X eissn 156-5471 11 3603 0416 10.187/moor.1110.0499 011 INFORMS Tight Approximation Agorithms for Maximum Separabe

More information

1D Heat Propagation Problems

1D Heat Propagation Problems Chapter 1 1D Heat Propagation Probems If the ambient space of the heat conduction has ony one dimension, the Fourier equation reduces to the foowing for an homogeneous body cρ T t = T λ 2 + Q, 1.1) x2

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

2M2. Fourier Series Prof Bill Lionheart

2M2. Fourier Series Prof Bill Lionheart M. Fourier Series Prof Bi Lionheart 1. The Fourier series of the periodic function f(x) with period has the form f(x) = a 0 + ( a n cos πnx + b n sin πnx ). Here the rea numbers a n, b n are caed the Fourier

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

Math 124B January 31, 2012

Math 124B January 31, 2012 Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat

More information

An explicit resolution of the equity-efficiency tradeoff in the random allocation of an indivisible good

An explicit resolution of the equity-efficiency tradeoff in the random allocation of an indivisible good An expicit resoution of the equity-efficiency tradeoff in the random aocation of an indivisibe good Stergios Athanassogou, Gauthier de Maere d Aertrycke January 2015 Abstract Suppose we wish to randomy

More information

SINGLE BASEPOINT SUBDIVISION SCHEMES FOR MANIFOLD-VALUED DATA: TIME-SYMMETRY WITHOUT SPACE-SYMMETRY

SINGLE BASEPOINT SUBDIVISION SCHEMES FOR MANIFOLD-VALUED DATA: TIME-SYMMETRY WITHOUT SPACE-SYMMETRY SINGLE BASEPOINT SUBDIVISION SCHEMES FOR MANIFOLD-VALUED DATA: TIME-SYMMETRY WITHOUT SPACE-SYMMETRY TOM DUCHAMP, GANG XIE, AND THOMAS YU Abstract. This paper estabishes smoothness resuts for a cass of

More information

Math 220B - Summer 2003 Homework 1 Solutions

Math 220B - Summer 2003 Homework 1 Solutions Math 0B - Summer 003 Homework Soutions Consider the eigenvaue probem { X = λx 0 < x < X satisfies symmetric BCs x = 0, Suppose f(x)f (x) x=b x=a 0 for a rea-vaued functions f(x) which satisfy the boundary

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

More Scattering: the Partial Wave Expansion

More Scattering: the Partial Wave Expansion More Scattering: the Partia Wave Expansion Michae Fower /7/8 Pane Waves and Partia Waves We are considering the soution to Schrödinger s equation for scattering of an incoming pane wave in the z-direction

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Sample Problems for Third Midterm March 18, 2013

Sample Problems for Third Midterm March 18, 2013 Mat 30. Treibergs Sampe Probems for Tird Midterm Name: Marc 8, 03 Questions 4 appeared in my Fa 000 and Fa 00 Mat 30 exams (.)Let f : R n R n be differentiabe at a R n. (a.) Let g : R n R be defined by

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-660: Numerica Methods for Engineering esign and Optimization in i epartment of ECE Carnegie Meon University Pittsburgh, PA 523 Side Overview Conjugate Gradient Method (Part 4) Pre-conditioning Noninear

More information

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC (January 8, 2003) A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC DAMIAN CLANCY, University of Liverpoo PHILIP K. POLLETT, University of Queensand Abstract

More information

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE KATIE L. MAY AND MELISSA A. MITCHELL Abstract. We show how to identify the minima path network connecting three fixed points on

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

Completion. is dense in H. If V is complete, then U(V) = H.

Completion. is dense in H. If V is complete, then U(V) = H. Competion Theorem 1 (Competion) If ( V V ) is any inner product space then there exists a Hibert space ( H H ) and a map U : V H such that (i) U is 1 1 (ii) U is inear (iii) UxUy H xy V for a xy V (iv)

More information