A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS. Dougsoo Kaown, B.Sc., M.Sc. Dissertation Prepared for the Degree of
|
|
- April Mathews
- 5 years ago
- Views:
Transcription
1 A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS Dougsoo Kaown, B.Sc., M.Sc. Dssertaton Prepared for the Degree of DOCTOR OF PHILOSOPHY UNIVERSITY OF NORTH TEXAS May 2009 APPROVED: Janguo Lu, Major Professor John Neuberger, Commttee Member Xaohu Yuan, Commttee Member Matthew Douglass, Char of the Department of Mathematcs Mchael Montcno, Interm Dean of the Robert B. Toulouse School of Graduate Studes
2 Kaown, Dougsoo. A New Algorthm for Fndng the Mnmum Dstance between Two Convex Hulls. Doctor of Phlosophy (Mathematcs), May 2009, 66 pp., 24 llustratons, bblography, 27 ttles. The problem of computng the mnmum dstance between two convex hulls has applcatons to many areas ncludng robotcs, computer graphcs and path plannng. Moreover, determnng the mnmum dstance between two convex hulls plays a sgnfcant role n support vector machnes (SVM). In ths study, a new algorthm for fndng the mnmum dstance between two convex hulls s proposed and nvestgated. A convergence of the algorthm s proved and applcablty of the algorthm to support vector machnes s demostrated. The performance of the new algorthm s compared wth the performance of one of the most popular algorthms, the sequental mnmal optmzaton (SMO) method. The new algorthm s smple to understand, easy to mplement, and can be more effcent than the SMO method for many SVM problems.
3 Copyrght 2009 by Dougsoo Kaown
4 ACKNOWLEDGEMENTS I thank my advsor Dr. Lu, Janguo who gave me endless support and gudance. There are no words to descrbe hs support and gudance. I thank my PhD commttee members Dr. Neuberger and Dr. Yuan. I thank my father. Even though he already left our famly, I always thnk about hm. Wthout my father, my mother rased me wth her heart wth endless love. I thank my mother. She always supports me. I also thank my wfe and my daughter who always love me and are wth me. Wthout them, I could not fnsh ths pleasant and wonderful journey. I thank the Department of Mathematcs for gvng me ths opportunty and supportng me on ths journey. I thank all staff and faculty members n the department. They are my lfe tme teachers.
5 CONTENTS ACKNOWLEDGEMENTS CHAPTER 1. INTRODUCTION Introducton and Dscusson of the Problem Prevous Works 4 CHAPTER 2. A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS A New Algorthm for Fndng the Mnmum Dstance of Two Convex Hulls Convergence for Algorthm CHAPTER 3. SUPPORT VECTOR MACHINES Introducton for Support Vector Machnes (SVM) SVM for Lnearly Separable Sets SVM Dual Formulaton Lnearly Nonseparable Tranng Set Kernel SVM 32 CHAPTER 4. SOLVING SUPPORT VECTOR MACHINEs USING THE NEW ALGORITHM Solvng SVM Problems Usng the New Algorthm Stoppng Method for the Algorthm 37 CHAPTER 5. EXPERIMENTS Comparson wth Lnearly Separable Sets Comparson wth Real World Problems 39 CHAPTER 6. CONCLUSION AND FUTURE WORKS 56 v
6 6.1. Concluson Future Work 56 BIBLIOGRAPHY 64 v
7 CHAPTER 1 INTRODUCTION 1.1. Introducton and Dscusson of the Problem Fndng the mnmum dstance between two convex hulls s a popular research topc wth mportant real-world applcatons. The felds of robotcs, anmaton, computer graphcs and path plannng all use ths dstance calculaton. For example, n robotcs, the dstance between two convex hulls s calculated to determne the path of the robot so that t can avod collsons. However, computng the mnmum dstance between two convex hulls plays ts most sgnfcant role n support vector machnes (SVM) [6], [7], [10], [15]. SVM s a very useful classfcaton tool. SVM gets ts popularty because of ts mpressve performance n many real world problems. Many fast teratve methods are ntroduced to solve the SVM problems. The popular sequental mnmal optmzaton (SMO) method s one of them. There are also many algorthms for fndng the mnmum dstance between two convex hulls, ncludng the Glbert algorthm and Mtchell-Dem yanov-malozemov (MDM) algorthm. These two algorthms are the most basc algorthms for solvng the SVM problems geometrcally. Furthermore there have been many publshed papers based on these two algorthms. Keerth s paper [2] s an mportant example of ths. In ths work, he uses these two algorthms to produce a new hybrd algorthm, and he shows that the SVM problems can be transformed by computng the mnmum dstance between two convex hulls. Defnton 1.1 Let {x 1,..., x m } be a set of vectors n R n. Then the convex combnaton of {x 1,..., x m } s gven by (1) α x where α = 1 and α 0 1
8 Defnton 1.2 For a gven set X = {x 1,..., x m }, cox s the convex hull of X. cox s the set of all convex combnatons of elements of X. (2) cox = { α x α = 1, α 0} Glbert s algorthm and the MDM algorthm are easy to understand and mplement. However, these two algorthms are desgned for fndng the nearest pont to the orgn from the gven convex hull. Let X = {x 1,..., x m } and Y = {y 1,..., y s } be fnte pont sets n R n. U and V denote convex hulls that are generated by X and Y, respectvely. U = {u = α x α = 1, α 0} V = {v = j β j y j j β j = 1, β j 0} Then the Glbert and the MDM algorthms fnd the soluton for the followng problem : mn u subject to u = α x, α = 1, α 0 =1 Ths problem s called the mnmal norm problem (MNP). The soluton for the above MNP s the nearest pont to the orgn from U. For x, y R n, the Eucldean norm can be defned as follows. x = (x, x) where the nner product s defned by (x, y) = x t y = x 1 y x n y n. The mnmum dstance between two convex hulls can be found by solvng an optmzaton problem. Ths problem s called the nearest pont problem (NPP). A NPP problem can be seen n Fgure 1.1. By usng Glbert s algorthm or the MDM algorthm, a user of these algorthms can fnd the mnmum dstance between two convex hulls by fndng the nearest pont to the orgn from the convex hull that s generated by the dfference of vectors. mn u v Nearest Pont Problem (NPP) 2
9 subject to u = v = m α x, =1 s β j y j, j=1 m α = 1, α 0 =1 s β j = 1, β j 0 j=1 Fgure 1.1. Nearest Pont Problem Set Z = {x y j x X and y j Y }. Let W = coz. W = {w = l γ l z l l γ l = 1, γ l 0} Then the mnmum dstance between two convex hulls can be found by solvng the followng MNP. subject to mn w w = γ l z l l γ l = 1, γ l 0 l=1 However, computng the mnmum dstance between two convex hulls by usng such algorthms requres bg memory storage, because the convex hull W has m s vertces. For example, let X = {x 1, x 2 } and Y = {y 1, y 2, y 3 }, 3
10 then Z = X Y = {x 1 y 1, x 1 y 2, x 1 y 3, x 2 y 1, x 2 y 2, x 2 y 3 }. Ths means that f one set has 1000 ponts and the other set has 1000 ponts, then 1,000,000 ponts wll be used n the Glbert or the MDM method. If these methods are used to fnd the mnmum dstance between two convex hulls, the calculaton would be too expensve. Thus, a drect method wthout formng the dfference of vectors s desred. Ths dssertaton study presents a drect method of computng the mnmum dstance between two convex hulls. Chapter 2 proposes a new algorthm for fndng the soluton for NPP. Convergence s proved n chapter 2. Chapter 3 ntroduces Support Vector Machnes (SVM). Chapter 4 dscusses solvng SVM by usng the new algorthm. Chapter 5 shows the experment results comparng SMO and the new algorthm. Fnally, the concluson and the future work for the new algorthm s dscussed n chapter Prevous Works In ths secton, two mportant teratve algorthms for fndng the nearest pont to the orgn from a gven convex hull are dscussed begnnng wth Glbert s algorthm [5]. Glbert s algorthm was the frst algorthm developed for fndng the mnmum dstance to the orgn from a convex hull. Glbert s algorthm s a geometrc method used to fnd the mnmum dstance. Addtonally, Glbert s algorthm can be vewed geometrcally. Glbert s Algorthm can be explaned by usng convex hull representaton. Frst, let X = {x 1,..., x m } and U = {u = α x α = 1, α 0}. Defnton 1.3 For u U, defne (3) δ MDM (u) = (u, u) mn(x, u) Theorem 1 n [1] states that δ MDM (u) = 0 f and only f u s the nearest to the orgn. Glbert s algorthm 4
11 Step 1 Choose u k U. Step 2 Fnd (x k, u k ) = mn(x, u k ). Step 3 If δ MDM (u k ) = 0 then stop. Otherwse, fnd u k+1 where u k+1 s the pont of the lne segment jonng u k and x k that has the mnmum norm. Step 4 Go to Step 2. In Step 3, the mnmum norm of lne segment of u k and x k can be found n the followng way. The mnmum norm of lne segment of u k and x k s u k f ( u k, x k u k ) 0 x k f b a 2 ( u k, x k u k ) u k + ( u k,x k u k ) b a 2 otherwse Glbert s Algorthm tends to converge fast at the begnnng, but t s very slow as t approaches the soluton, because Glbert s Algorthm remans zgzaggng untl t fnds the soluton. Usng the fast convergence at the begnnng, Keerth [2] used t when he made a hybrd algorthm. After Glbert s algorthm, Mtchell-Dem yanov-malozemov (MDM) suggested a geometrc algorthm to fnd the nearest pont to the orgn. Lke Glbert s algorthm, the MDM algorthm s an teratve method to fnd the nearest pont from the orgn. Let X and U be the same as the prevous page. Let u be the nearest to the orgn. Defne (x, u) = max α >0 (x, u) where α [ s the coeffcent of x n the convex hull representaton and (x, u) = mn(x, u). Defnton 1.4 Defne (4) MDM (u) = (x, u) (x, u) Note that (5) (x, u) (x, u) = (x x, u) Lemma 2 n [1] states that α MDM (u) δ MDM (u) MDM (u) where α s the coeffcent of x n the convex hull representaton. 5
12 Then by theorem 1 and lemma 2 n [1], u s the nearest pont to the orgn f and only f MDM (u) = 0. Then the algorthm for fndng u s the followng. The MDM Algorthm Step 1 Choose u k U Step 2 Fnd x k and x k Step 3 Set u k (t) = u k + tα k (x k x k ) where α has the same ndex of x k and t [0, 1] Step 4 Set u k+1 = u k + t k α k (x k x k ) where (u k(t k ), u k (t k )) = mn t [0,1] (u k(t), u k (t)) Step 5 If MDM,k (u k ) = (x k x k, u k ) = 0 then stop. Otherwse go to Step 2. The convergence was proved n [1]. Ths algorthm s also very smple to mplement. In ths algorthm, u k needs to stay n the convex hull. In order to be nsde of the convex hull, the algorthm adds tα k x and subtracts tα k x k to the current pont so that the sum of α s 1. Here, t may take some tme to comprehend why only a nonzero α s used whle tryng to fnd x n the convex hull representaton. The MDM algorthm has a reason to use x when the current pont u k moves. If max(x, u) were used nstead of max (x, u), u k may converge α >0 slowly or may not converge to u. t k α k wll decde how far the current pont can move and x k x k wll gve the drecton for u k. Note that n each teraton, u k updates two coeffcents n the representaton. By takng ths method, four or more coeffcents are updated at a tme. The method wll be shown n the Appendx A. 6
13 CHAPTER 2 A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS 2.1. A New Algorthm for Fndng the Mnmum Dstance of Two Convex Hulls Ths chapter ntroduces a new algorthm for fndng the mnmum dstance of two convex hulls and a proof for ts convergence of the algorthm. Ths algorthm s an teratve method for computng the mnmum dstance between two convex hulls. Moreover, ths algorthm has a good applcaton to SVM. Let X, Y, Z, U, V and W be the same as n chapter 1. As mentoned n Chapter 1, fndng the nearest pont to the orgn of W s equvalent to solvng NPP. Let w be the nearest pont to the orgn and u v be the soluton for NPP, then w = u v. Note that for any w W, w = u v = α x j β j y j δ MDM (w) = (w, w) mn(z l, w) = (w, w) mn (x y j, w) l,j MDM (w) = max γ l >0 (z l, w) mn l (z l, w) = max α >0,β j >0 (x y j, w) mn,j (x y j, w) Mtchell-Dem yanov-malozemov defned MDM (w) and δ MDM n ther paper [1]. Defnton 2.1 For u U and v V, defne (6) x (u v) = max α >0 (x, u v) mn (x, u v) (7) y (v u) = max β j >0 (y j, v u) mn j (y j, v u) Actually, MDM (u v) = x (u v)+ y (v u) n order to show ths. Lemma 2.1 s needed. 7
14 Lemma 2.1 (8) max(x y j, u v) = max(x, u v) mn(y j, u v) (9) mn(x y j, u v) = mn(x, u v) max(y j, u v) (10) max(y j, u v) = mn(y j, v u) (Proof) (8) can be shown by nequaltes ( ). Let (x M y M, u v) = max(x y j, u v), then (x M y M, u v) = (x M, u v) (y M, u v). (x M, u v) max(x, u v) and (y M, u v) mn(y j, u v). Thus (x M y M, u v) = (x M, u v) (y M, u v) max(x, u v) mn(y j, u v). = max(x y j, u v) max(x, u v) mn(y j, u v). ( ) Let (x m, u v) = max(x, u v) and (y m, u v) = mn(y j, u v), then (x m, u v) (y m, u v) = (x m y m, u v). max(x y j, u v) (x m y m, u v) = max(x y j, u v) max(x, u v) mn(y j, u v) Proof of (9) s smlar to (8). Proof of (10) can be shown by nequaltes. Let (y M, u v) = max(y j, u v), then (y M, u v) = (y M, v u). = max(y j, u v) = (y M, v u) mn(y j, v u) Let (y mn, v u) = mn(y j, v u), then (y mn, v u) = (y mn, u v). = mn(y j, v u) = (y mn, u v) max(y j, u v). Lemma 2.2 (11) MDM (u v) = x (u v) + y (v u) (Proof) Snce u v W, MDM (u v) = max (x y j, u v) mn (x y j, u v) α >0,β j >0,j = max (x, u v) mn (y j, u v) mn(x, u v) + max(y j, u v) α >0 β j >0 j = max (x, u v) mn(x, u v) mn (y α >0 j, u v) + max(y j, u v) β j >0 j 8
15 = max (x, u v) mn(x, u v) + max (y α >0 j, v u) mn(y j, v u) β j >0 j = x (u v) + y (v u). Usng (8), (9) and (10), lemma 2.2 was shown. A new algorthm for fndng the mnmum dstance between two convex hulls uses the teratve method. In the new algorthm, the followng notatons are used. Let (x k, u k v k ) = max α k >0 (x k, u k v k ) and (x k, u k v k ) = mn (x k, u k v k ). (12) x (u k v k ) = (x k x k, u k v k ) (13) y (v k u k ) = (y k y k, v k u k ) Algorthm 2.1 Step 1 Choose u k U and v k V where u k = α k x and v k = j β jk y j. Step 2 Fnd x k and x k where (x k, u k v k ) = mn(x k, u k v k ) and (x k, u k v k ) = max (x α k >0 k, u k v k ). Step 3 u k+1 = u k + t k α k (x k x k ) where 0 t k 1 and t k = mn{1, Note t k s defned by x (u k v k ) α x k x 2 }. k (u k (t k ) v k, u k (t k ) v k ) = mn t [0,1] (u k(t) v k, u k (t) v k ) where u k (t) = u k + tα k (x k x k ) and (u k (t) v k, u k (t) v k ) = α k xk x k 2 t 2 2α k x (u k v k )t + (u k v k, u k v k ). Step 4 Fnd y jk and y j k where (y jk, v k u k+1 ) = mn(y jk, v k u k+1 ) j and (y j k, v k u k+1 ) = max (y j k, v k u k+1 ). β jk >0 Step 5 v k+1 = v k + t k β j k (y j k y j k ) Note t k where 0 t k 1 and t k = mn{1, y(v k u k+1 ) s defned by β j k yjk y j k 2 }. 9
16 (v k (t k ) u k+1, v k (t k ) u k+1) = mn t [0,1] (v k(t ) u k+1, v k (t ) u k+1 ) where v k (t ) = v k + t β j k (y jk y j k ) and (v k (t ) u k+1, v k (t ) u k+1 ) = β j k y jk y j k 2 t 2 2t β j k y (v k u k+1 )+(v k u k+1, v k u k+1 ). Step 6. Iteratng ths way, a sequence {u k v k } s obtaned such that u k+1 v k+1 u k v k snce u k+1 v k u k v k and u k+1 v k+1 u k+1 v k. Fgure 2.1. Algorthm 2.1 Fgure 2.1 descrbes how to fnd new u and new v. Frst fnd new u by usng old u, old v, then fnd new v by usng old v and new u. Iterated ths way, ths algorthm fnds the soluton u v for NPP. 10
17 2.2. Convergence for Algorthm 2.1 Before provng the convergence for Algorthm 2.1, let s defne the followng. Defnton 2.2 For u U and v V, defne (14) δ x (u v) = (u, u v) mn (x, u v) (15) δ y (v u) = (v, v u) mn j (y j, v u) Lemma 2.3 Show that u v (u v ) δ x (u v) + δ y (v u). (Proof) By lemma 2 n [1] states that for any w W, w w δ MDM (w) Note that snce u v W, δ MDM (u v) = (u v, u v) mn,j (x y j, u v) = (u, u v) (v, u v) mn(x, u v) mn(y j, v u) j = (u, u v) mn(x, u v) + (v, v u) mn(y j, v u) j = δ x (u v) + δ y (v u). Then u v (u v ) δ MDM (u v) = δ x (u v) + δ y (v u). Lemma 2.4 Suppose u k U, v k V, k = 1, 2... s a sequence of ponts such that u k+1 v k+1 u k v k and there s a subsequence such that δ x (u kj v kj ) 0 and δ y (v kj u kj ) 0 then u k v k u v = w. (Proof) Snce u k v k s a nonncreasng sequence bounded below, t has a lmt u k v k µ. By lemma 2.3, ukj v kj (u v ) 0 snce δx (u kj v kj ) 0 and δ y (v kj u kj ) 0. Ths mples ukj v kj u v = w. Hence u k v k u v = w. Any convergent subsequence of u k v k must converge to w = u v because of unqueness of w = u v. 11
18 Lemma 2.3 and 2.4 are mportant lemmas to prove the convergence of Algorthm 2.1. After the followng theorem, support vectors wll be defned. Support vectors are sgnfcant concept n ths dssertaton and support vector machnes (SVM). Theorem 1 Show (16) mn (x, u v ) = (u, u v ) (17) mn j (y j, v u ) = (v, v u ) (proof) Let u = α x and v = j β j y j. Note mn(x v, u v ) (u v, u v ) and mn(y j u, v u ) (v u, v u ). j Then mn(x, u v ) (v, u v ) (u, u v ) (v, u v ). = mn(x, u v ) (u, u v ) = ( α x, u v ) but mn(x, u v ) α (x, u v ). Thus mn(x, u v ) = (u, u v ). Also mn(y j, v u ) (u, v u ) (v, v u ) (u, v u ). j = mn(y j, v u ) (v, v u ) = ( βj yj, v u ) j j but mn(y j, v u ) βj (yj, v u ). j j Thus mn(y j, v u ) = (v, v u ). j Defnton 2.3 Let the soluton of NPP be u v then those x s correspondng to α > 0 and y j s correspondng to β j > 0 are called support vectors. As seen n Fgure 2.1, u and v can be represented by support vectors x 1, x 2, y 1 and y 6. Subtract (v, u v ) on both sdes from (16), then Smlarly from (17), mn (x v, u v ) = (u v, u v ). mn j (y j u, v u ) = (v u, v u ). 12
19 Fgure 2.2. Support Vectors Lemma 2.5 For any vector u v, u U, v V (18) α x (u v) δ x (u v) x (u v) where α > 0 and (x, u v) = max α >0 (x, u v). (19) β j y (v u) δ y (v u) y (v u) where β j > 0 and (y j, v v) = max (y j, v u). β j >0 m (Proof of (18)) Let u = α x, then =1 (u v, u v) = m =1 α (x v, u v) max α >0 (x v, u v). δ x (u v) x (u v) 13
20 Let (x, u v) = mn(x, u v) and (x, u v) = max (x, u v) α >0 Then x (u v) = (x x, u v). Set A = (α 1,..., α m ) where α = α f, α = 0 f = α = α + α f =. u = u + α (x x ) (u v, u v) = (u + α (x x ) v, u v) = (u, u v) + α ((x x ) v, u v) = (u, u) (u, v) + α ((x x ), u v) (v, u v) = (u, u) (u, v) α x (u v) (v, u) + (v, v) so = (u v, u v) α x (u v). δ x (u v) = (u v, u v) mn(x v, u v) (u v, u v) (u v, u v). But the rght hand sde of nequty s equal to (u v, u v) (u v, u v) + α x (u v). δ x (u v) α x (u v) Proof of (19) s smlar to the proof of (18). Lemma 2.6 (20) lm k α k x (u k v k ) = 0 (21) lm k β k y (v k u k ) = 0 14
21 (Proof of (20)) Frst, observe that u k (t) = u k + tα k (x k x k ), v k(t ) = v k + t β k (y k y k ) and u k+1 v k+1 u k v k. Note that (u k (t) v k, u k (t) v k ) = (u k v k, u k v k ) 2tα x(u k v k ) + t 2 (α x k x k ). Suppose the asserton false. Then there s a subsequence u kj v kj for whch α k j x (u kj v kj ) ɛ > 0. Then (u kj (t) v kj, u kj (t) v kj ) (u kj v kj, u kj v kj ) 2tɛ + t 2 d 2 where d = max l x p. l,p Because rght hand sde of nequalty s mnmzed at t = mn{ ε, 1} d 2 (2.6.1) If t = ε d 2, (u kj (t) v kj, u kj (t) v kj ) (u kj v kj, u kj v kj ) 2 ε d 2 ε + ( ε d 2 ) 2 d 2 (2.6.2) If t = 1( ε d 2 > 1 ε > t d 2 ) By (2.6.1) and(2.6.2) = (u kj v kj, u kj v kj ) ε2 d 2 = (u kj v kj, u kj v kj ) t ε. (u kj (t) v kj, u kj (t) v kj ) (u kj v kj, u kj v kj ) 2t ε + t 2 d 2 (u kj v kj, u kj v kj ) 2t ε + εt = (u kj v kj, u kj v kj ) t ε. (u kj (t) v kj, u kj (t) v kj ) (u kj v kj, u kj v kj ) t ε. Ths contradcts u k+1 v k+1 u k v k. Proof of (21) s smlar to (20). Lemma 2.7 (22) lm x (u k v k ) = 0 (23) lm y (v k u k ) = 0 (proof) Suppose lm x (u k v k ) 0 and lm y (v k u k ) 0 then there s x(u v ) such that lm x (u k v k ) = x(u v ). 15
22 Then for suffcently large k > K 1 (24) x (u k v k ) x(u v ). 2 And there s y(v u ) such that lm y (v k u k ) = y(v u ). Then for suffcently large k > K 2 (25) y (v k u k ) y(v u ). 2 By lemma 2.6, α k 0 Let t k be a pont that (u k (t) v k, u k (t) v k ) assumes ts global mnmum. Then (u k (t) v k, u k (t) v k ) = (u k + tα (x k x k) v k, u k + tα (x k x k) v k ) = (u k v k + tα (x k x k), u k v k + tα (x k x k)) = (u k v k, u k v k ) 2tα (u k v k, (x k x k )) + t 2 (α x k x k ) 2 = (u k v k, u k v k ) 2tα x (u k v k ) + t 2 (α x k x k ) 2. (26) t k = x(u k v k ) α x k x k 2 By x (u k v k ) x (u v ) 2 and α k 0, t k. Agan by lemma 2.6 β k 0 and let t k be a pont that that (v k (t ) u k, v k (t ) u k ) assumes ts global mnmum. Then (27) t k = y(v k u k ) β y k y k Agan by y (v k u k ) y (v u ) 2 and β k 0, t k. Hence, for large k > K > K 1, the mnmum of (u k (t) v k, u k (t) v k ) on the segment 0 t 1 s obtaned at t k = 1 so that for such values k (28) u k+1 = u k + α k(x k x k). 16
23 Also, for large k > K > K 2, the mnmum of (v k (t ) u k, v k (t ) u k ) on the segment 0 t 1 obtaned at t k = 1 so that for such values k (29) v k+1 = v k + β k(y k y k). Let u kj v kj be a subsequence such that (30) x (u kj v kj ) x(u v ). By dscardng terms from the sequence u kj v kj, the followng condtons are satsfed. (31) α k j α and β k j β where α and β are coeffcent for u and v respectvely. (32) x kj = x X Ths means that mn(x, u kj v kj ) s obtaned for all k j at the same value of (x, u kj v kj ) = mn (x, u kj v kj ) (33) x k j = x X (34) (x, u kj v kj ) = max α >0 (x, u kj v kj ) Usng (30), (31), (32) and (33), the followng s obtaned (x x, u v ) = x(u v ) Now ntroduce the new notatons, (35) ρ x = mn(x, u v ) (36) X 1 = {x X (x, u v) = ρ x} (37) X 2 = X \ X 1 Note that (38) x X 2 (x, u v ) > ρ x 17
24 (39) x X 1 and Set x X 2 (40) ρ x = mn (x, u v ) x X2 and (41) ρ x > ρ x then (42) τ x = mn{ x(u v ), ρ x ρ x}. Note that for x X 2, (x, u v ) ρ x + τ x. Choose δ ox such that whenever (u v) (u v ) < δ ox, (43) max (x, u v) (x, u v ) < τ x 4 (Clam) Let k j be such that u kj v kj (u v ) < δ ox. Then X 1 (u kj v kj ) X 1 where X 1 (u kj v kj ) = {x X (x, u kj v kj ) = mn(x, u kj v kj )}. (proof of clam) Let x X 1 (u kj v kj ), then (x, u v ) = (x, u kj v kj ) + (x, (u v ) (u kj v kj )) = mn (x, u kj v kj ) + (x, (u v ) (u kj v kj )) = mn (x, u kj v kj ) + (x, u v ) (x, (u kj v kj )) = ρ x ρ x + mn (x, u kj v kj ) + (x, u v ) (x, u kj v kj ) = ρ x + (x, (u v ) (u kj v kj )) + [ mn(x, u v ) + mn(x, u kj v kj )] ρ x + (x, (u v ) (u kj v kj )) + max (x, u kj v kj ) (x, u v ) ρ x + τ x /4 + τ x /4 = ρ x + τ x /2. 18
25 Snce δ Ox was chosen arbtrarly and (x, u v ) ρ x + τ x /2. x X 1 (end of clam) Let k j > K be such that the followng condtons hold. (44) x (u k v k ) x(u v ) τ x /4 (45) u kj v kj (u v ) < δ ox /2 (46) where p α k j +l 1 < δ ox /4d snce α k 0 l=1 d x = max x l x r > 0, d y = max y l y r and d = {d x, d y } l,r l,r (47) p β k j +l 1 < δ ox /4d snce β k 0 l=1 By (28) and (46), (48) u kj +p = u kj + p α k j +l 1(x kj +l 1 x k j +l 1) l=1 (49) u kj +p u kj < δ o x 4d d = δ o x /4 By (29) and (47), v kj +p = v kj + p β k j +l 1(y kj +l 1 y k j +l 1) l=1 v kj +p v kj < δ o x 4d d = δ o x /4. Then by (46) u kj +p v kj +p (u v ) u kj +p v kj +p (u kj v kj ) + u kj v kj (u v ) < δ ox 19
26 Hence X 1 (u kj +p v kj +p) X 1. Next for all x X 1 and p [1 : m], the followngs are obtaned (x, u kj +p v kj +p) = (x, u v ) + (x, u kj +p v kj +p (u v )) = (x, u v ) + (x, u kj +p v kj +p) (x, u v ) ρ x + τ x /4 x(u v ) τ x + ρ x + τ/4 = x(u v ) + ρ x 3τ x 4 (50) (x, u kj +p v kj +p) x(u v ) + ρ x 3τ x 4. Now show that for some vector u kj +p v kj +p, p [1 : m] max (x, u kj +p v kj +p) x(u v ) + ρ { α k j +p x 3τ x >0} 4. If ths nequalty fals to hold for some p, then x k j +p X 2. Snce X 1 (u kj +p v kj +p) X1, u kj +p+1 = u kj +p + α k j +p (x k j +p x k j +p ) nvolves the vector x k j +P wth zero coeffcents, whle the newly ntroduced vector x k j +p satsfes (50). Snce X 2 contans at most m-1 vectors, there s p [1 : m] such that all vectors appearng n the representaton hold (50). On the other hand, max α >0 (x, u kj +p v kj +p) x(u v ) + ρ x 3τ x 4 (51) mn(x, u kj +p v kj +p) ρ x τ x /4. Thus x (u kj +p v kj +p) = max α >0 (x, u kj +p v kj +p) mn(x, u kj +p v kj +p) x(u v ) τ x /2. Ths contradcts (44), Thus lm x (u k v k ) = 0. Smlarly, lm y (v k u k ) = 0 can be shown. Theorem 2 x (u v) + y (v u) = 0 f and only f u v = u v. 20
27 (proof) By lemma 2.2, x (u v) + y (v u) = MDM (u v). And by theorem 1 and lemma 2 n [1], MDM (u v) = 0 f and only f u v = u v. Thus, theorem has been proved. Theorem 3. The sequence u k v k converges to u v. (proof) By Lemma 2.7, there are subsequence u kj v kj and v kj u kj such that x (u kj v kj ) 0 and y (v kj u kj ) 0. Then by Lemma 2.3, 2.4 and 2.5 u k v k converges to u v. Theorem 4. Let (52) G = {u = α x (u, u v ) = (u, u v )} (53) G = {v = j β j x j (v, v u ) = (v, v u )} Then for large k, u k G and v k G and (proof) By theorem 1, mn (x, u v ) = (u, u v ) and mn j (y j, v u ) = (v, v u ). Set If X 2 = then u k G, k. If Y 2 = then v k G, k. X 1 = {x X (x, u v ) = (u, u v )}, X 2 = X \ X 1 = {x X (x, u v ) > (u, u v )}, Now suppose X 2, Y 2 Let Y 1 = {y j Y (y j, v u ) = (v, v u )} Y 2 = {y j Y (y j, v u ) > (v, v u )}. (54) τ = mn x X 2 (x, u v ) (u, u v ) > 0 21
28 and (55) τ = mn y j Y 2 (y j, v u ) (v, v u ) > 0 Note that (x v k, u k v k ) (x v k, u v ) and (y j u k, v k u k ) (y j u k, v u ) snce u k v k v u, ths follows that for large k > K 1 (56) max (x v k, u k v k ) (x v k, u v ) < τ 4 (57) max (y j u k, v k u k ) (y j u k, v u ) < τ j 4 Then by the clam n lemma 2.7 for large k > K 1 X 1 (u k v k ) X 1, where X 1 (u k v k ) = {x X (x, u k v k ) = mn (x, u k v k )} and Y 1 (v k u k ) Y 1, where Y 1 (v k u k ) = {y j Y (y j, v k u k ) = mn j (x j, v k u k )}. Now let x X 1 then (58) (x, u v ) = (u, u v ) By (56), max (x, u k v k ) (x, u v ) < τ. Then 4 (59) (x, u k v k ) (x, u v ) < τ 4 By (58), (x, u k v k ) (u, u v ) < τ 4. Then (60) (x, u k v k ) < (u, u v ) + τ 4 Now let x X 2, then by (54) (61) (x, u v ) (u, u v ) τ and by (56) (62) τ 4 < (x, u k v k ) (x, u v ) < τ 4 Add (x, u k v k ) (x, u v ) to (61) on both sdes, Then 22
29 (x, u k v k ) (x, u v ) + (x, u v ) (u, u v ) τ + (x, u k v k ) (x, u v ). And by (62), (63) (x, u k v k ) (u, u v ) + 3τ 4. Smlarly, (64) (y j, v k u k ) (v, v u ) + τ 4 f y j Y 1 and (65) (y j, v k u k ) (v, v u ) + 3τ 4 f y j Y 2. are obtaned. For u k, k > K 1, f the convex hull representaton for u k nvolves x X 2 wth a nonzero coeffcent, then x (u k v k ) τ 2 by (60), (63) and X 1(u k v k ) X 1. Smlarly, for v k, k > K 1, nvolves y j Y 2 wth a nonzero coeffcent, then (66) y (u k v k ) τ 2. Then for large k > K 1 and k > K 1, u k and v k can be wrtten as below (67) u k = { x X 1 } (68) v k = Consder (u k u, u v ) { x X 2 } {j y j Y 1 } α (k) x + { x X 2 } β (k) j y j + {j y j Y 2 } α (k) x β (k) j y j = (u k, u v ) (u, u v ) = ( α (k) x + α (k) x, u v ) (u, u v ) { x X 1 } { x X 2 } = ( α (k) x, u v ) + ( α (k) x, u v ) (u, u v ) { x X 1 } { x X 2 } = α (k) (x, u v ) + α (k) (x, u v ) { x X 1 } { x X 2 } α (k) (u, u v ) α (k) (u, u v ) { x X 1 } { x X 2 } = α (k) (x u, u v ) τ α (k) { x X 2 } 23
30 snce (x, u v ) = (u, u v ) f x X 1 and (x u, u v ) τ f x X 2. By the convergence of u k v k, the left hand sde of ths nequalty tends to zero as k Ths means that α (k) 0. (69) Smlarly, { x X 2 } (v u, v u ) = β (k) j (y j u, v u ) τ {j y j Y 2 } Also, {j y j Y 2 } β (k) j 0 can be seen. {j y j Y 2 } Choose K > K 1 so large and for k K, where d = { X 2 } β (k) j α (k) τ 2d 2 max x x j > 0. x X 1, x j X 2 Let t k be a pont that (u k (t) v k, u k (t) v k ) assumes a global mnmum. If the representaton of u k, k K, nvolves x X 2 wth a nonzero coeffcent then t k = x(u k v k ) τ α k x k x k 2 2 ( { x X 2 } α (k) )d 2 1 by x (u k v k ) τ 2. Hence u k+1 = u k + α k (x k x k ). Smlarly, choose K > K 1 so large and for k K then v k+1 = v k + β k (y k y k ). Now, for u k+1, x k X 1 meanwhle x k wll be dsappeared n the representaton of u k+1. (.e x k has zero coeffcent α k n u k+1) Then there s l {1,..., m} such that u k+l does not nvolve the vector x X 2 so there k > K + l such that u k G. Smlarly, there s some k such that v k G. 24
31 CHAPTER 3 SUPPORT VECTOR MACHINES 3.1. Introducton for Support Vector Machnes (SVM) Support vector machnes (SVM) s a famly of learnng algorthms used for the classfcaton of objects nto two classes. The theory s developed by Vapnk and Chervonenks. SVM became very popular n the early 90 s. SVM has been broadly appled to all knds of classfcatons from handwrtten recognton to face detecton n mage. SVM has ganed popularty because t can be appled to a wde range of real world applcatons and computatonal effcency. Moreover, t s theoretcally robust. In lnearly separable data sets, there are many decson boundares. In order to have a good generalzaton for a new decson, a good decson boundary s needed. The decson boundares of two sets should be as far away from two classes as possble n order to make an effectve generalzaton. The shortest dstance between two boundares s called a margn. SVM maxmzes the margn so that the maxmzed margn results n less errors for a future test, whle other methods only reduce classfcaton errors. In ths vew, SVM enables more generalzaton than other methods. To llustrate ths, consder the Fgure 3.1. Lne (1) and (2) have zero errors for classfyng the two groups, but the goal of ths classfcaton s to classfy future nputs. If test pont T belongs to O group, then (1) and (2) are not the same any more. Ths means that (2) s a better generalzaton than (1). Maxmal margn can be obtaned by solvng a quadratc programmng (QP) optmzaton problem. The optmal hyperplane whch has the maxmal margn can be obtaned by solvng the quadratc programmng optmzaton problem. The SVM QP optmzaton problem wll be ntroduced n Secton 3.2. A classfer wll be defned n Secton 3.2 as well. In 1992, Bernhard Boser, Isabelle Guyon and Vapnk added a kernel trck whch wll be ntroduced 25
32 Fgure 3.1. SVM Generalzaton n Secton 3.5 to lnear classfer. Ths kernel trck makes lnear classfcaton possble n a feature space that s usually bgger than the orgnal space SVM for Lnearly Separable Sets Ths secton wll ntroduce the most basc SVM problem: SVM wth lnearly separable sets. Let S be a tranng of ponts x R m wth t { 1, 1} for = 1,..., N, then S = {(x 1, t 1 ),..., (x N, t N )}.e If t = +1 then x belongs to the frst class. If t = 1 then x belongs to the second class. Let S be lnearly separable, then there s a hyperplane wx + b = 0 where w n R m and wx + b = 0 correctly classfes all n S. Defne classfer f(x ) = wx + b, f(x ) = 1 f t = 1 and f(x ) = 1 f t = 1..e wx + b > 0 f t = +1 wx + b < 0 f t = 1 26
33 Fgure 3.2. Lnear Classfer Even f there s a classfer that separates the tranng set correctly, a maxmal margn s needed. In order to separate the two sets correctly and a have better generalzaton, t s necessary to maxmze the margn. To compute an optmal hyperplane, consder the followng nequaltes for any = 1,..., N wx + b > 1 f t = +1 wx + b < 1 f t = 1 These two nequaltes are decson boundares. To compute the margn, consder the followng two hyperplanes : wx 1 + b = 1 wx 2 + b = 0 where dstance from x 1 to x 2 s the shortest dstance of two hyperplanes. Ths means that the drecton x 2 x 1 s parallel to w. Subtractng the frst from the second equalty, w(x 2 x 1 ) = 1 s obtaned, snce w and x 2 x 1 have the same drecton, x 2 x 1 = 1 w. In order to mnmze the error, x 2 x 1 should be maxmzed. Maxmzng x 2 x 1 s equvalent to mnmzng w. Thus, the optmal hyperplane can be found by solvng the followng optmzaton problem. (70) mn 1 2 w 2 27
34 Fgure 3.3. Classfer wth Margn subjec to t (w T x + b) SVM Dual Formulaton Ths secton wll ntroduce the dual formulaton of SVM optmzaton problem. The dual formulaton s usually effcent n mplementaton. Also, n dual formulaton, a non-separable set wll be more generalzed. The most mportant aspect of dual formulaton s the kernel trck. The kernel trck can be used n the dual formulaton. The last part of ths secton wll show the dual formulaton usng KKT condton n optmzaton theory. Let s consder the Lagrangan functon from (70), (71) L(w, b, λ) = 1 2 w 2 N λ [t (w T x + b) 1] =1 where λ = (λ 1,...λ N ) and dfferentate wth respect to the orgnal varables. (72) (73) N L w (w, b, λ) = w λ t x = 0. =1 N L b (w, b, λ) = λ t = 0. =1 28
35 From (72), (74) w = Substtute (74) nto (71) N λ t x. =1 L(w, b, λ) = 1 2 N N t t j λ λ j x x j =1 j=1 N N N t t j λ λ j x x j b λt + =1 j=1 =1 N =1 λ = 1 2 Thus SVM-DUAL s the followng. N N t t j λ λ j x x j + =1 j=1 N λ. =1 (75) max 1 2 N N t t j λ λ j x x j + =1 j=1 N =1 λ N s.t λ t = 0 =1 λ 0 Once λ s found, (w, b) can be found easly by usng w = N λ t x and wx + b = 1 for x n the frst class. = Lnearly Nonseparable Tranng Set In the case of a lnearly nonseparable tranng set, lnear classfer wx + b = 0 cannot separate the tranng set correctly. Ths means that the lnear classfer needs to allow some errors. The followng three cases wll happen wth lnear classfer wx + b = 0. Case 1 : Satsfy t (wx + b) 1 Case 2 : 0 t (wx + b) < 1 Case 3 : t (wx + b) < 0 Case 2 and case 3 have errors whle Case 1 does not have an error. These errors can be measured usng slack varable s. In Case 1, s = 0. In Case 2, 0 < s < 1. And n Case 3, s > 1. Fgure 3.4 shows the case of a lnearly nonseparable set. 29
36 Fgure 3.4. Lnearly Nonseparable Set In the case of a lnearly nonseparable set, the margn s maxmzed whle the number of s > 0 s mnmzed. So there are two goals. The frst goal s to maxmze the margn, and the second goal s to mnmze the number of s > 0. Now, consder (76) 1 2 w 2 + C N =1 s In (76), C was ntroduced. Let s thnk of two extreme cases. When C = 0, the second goal can be gnored. Ths s the case of a lnearly separable set. When C, the margn s gnored. C s a parameter that the user of SVM can choose to tune the trade off between the wdth of margn and the number of s. So the optmzaton problem for nonlnearly separable case s the followng : (77) mn 1 2 w 2 + C N =1 s s.t t (wx + b) 1 s s 0 30
37 = 1, 2,..., N Now let s consder Lagrangan functon from (77). (78) L(w, b, s, λ, µ) = ( 1 N N 2 w 2 + C s ) [ λ (t (w T x + b) 1 + s ) + =1 =1 And dfferentate wth respect to the orgnal varables. N µ s ] =1 (79) (80) (81) N L w (w, b, s, λ, µ) = w λ t x = 0 =1 N L b (w, b, s, λ, µ) = λ t = 0 =1 L (w, b, s, λ, µ) = C λ µ = 0 s From (78), (82) w = Substtute (82) nto (78), N λ t x =1 L(w, b, s, λ, µ) N N t t j λ λ j x x j + C N s N N t t j λ λ j x x j b N λt N λ s N µs + N = 1 2 =1 j=1 = 1 2 N =1 j=1 =1 j=1 =1 =1 j=1 =1 N t t j λ λ j x x j + N λ + C N s N λ s N µs =1 By (81) N N = 1 t 2 t j λ λ j x x j + N λ =1 Thus, Daul-SVM lnearly separable s obtaned as follows =1 =1 =1 =1 =1 λ =1 (83) max 1 2 N N t t j λ λ j x x j + =1 j=1 N =1 λ s.t N λ t = 0 =1 0 λ C = 1, 2,..., N 31
38 3.5. Kernel SVM In many problems, a lnear classfer could not have both hgh performance and good generalzaton n the orgnal space. Fgure 3.5. Nonlnear Classfer In Fg 3.5, a lnear classfer cannot separate two data sets correctly, so a nonlnear classfer must be consdered. As a result, a nonlnear classfer produces an effcent generalzaton for the future test. As t s shown n the above Fgure 3.5, the nonlnear classfer has a good performance (.e., generalzaton.) Then how does SVM separate a nonlnear separable data set? Consder the tranng set S = {(x 1, t 1 ),..., (x N, t N )} whch s not lnearly separable. However S can be lnearly separable n a dfferent space whch s usually a hgher dmenson than the orgnal space usng the set of real functon φ 1,..., φ M. Here, assume φ to be unknown. These functons are called features. Usng φ, x = (x 1,.., x m ) can be mapped to φ(x) = (φ 1 (x),..., φ M (x)). After mappng to the feature space, the new tranng set S = {(φ(x 1 ), t 1 ),..., (φ(x N ), t N )} s obtaned. To see ths clearly, consder the followng example. S 1 = {(a, 1), (b, 1), (c, 1), (d, 1)} where a = (0, 0), b = (1, 0), c = (0, 1) and d = (1, 1). S 1 cannot be lnearly separable wthout makng any error. 32
39 Now let φ(x) = x 2 1 2x1 x 2 where x = (x 1, x 2 ). By the feature functon φ(x), x 2 2 a = (0, 0) φ(a) = a = (0, 0, 0) b = (1, 0) φ(b) = b = (1, 0, 0) c = (0, 1) φ(c) = c = (0, 0, 1) d = (1, 1) φ(d) = d = (1, 2, 1) Frst, check that S 1 = {(a, 1), (b, 1), (c, 1), (d, 1)} s lnearly separable n R 3, so φ(x) can be seen. However, as mentoned earler, φ usually s unknown f the kernel functon s used. Let x = (x 1, x 2 ) and y = (y 1, y 2 ). Then defne K(x, y) = (x, y) 2 = x 2 1y x 1 y 1 x 2 y 2 + x 2 2y2. 2 We can easly check K(a, b) = φ(a)φ(b) K(c, d) = φ(c)φ(d). SVM uses a kernel functon that the value of the kernel functon s the same as the value of nner product n the feature space. Thus, usng new tranng set S and by (83), let s consder the followng optmzaton problem : (84) max 1 2 N N t t j λ λ j φ(x )φ(x j ) + =1 j=1 N =1 λ s.t N λ t = 0 =1 0 λ C = 1, 2,..., N Now, defne the kernel K(x, x j ) = φ(x )φ(x j ) for any nner product (x, x j ) n the orgnal space. Then (84) can be wrtten as the followng : (85) max 1 2 N N t t j λ λ j K(x, x j ) + =1 j=1 N =1 λ 33
40 N s.t λ t = 0 =1 0 λ C = 1, 2,..., N By the defnton of kernel, K(x, x j ) s always correspondng to the nner product n some feature space. The kernel K(x, x j ) can be computed wthout computng the mage φ(x ) and φ(x j ). (85) s called kernel substtuton. By usng kernel substtuton, nner product evaluatons can be computed n the orgnal space wthout knowng the feature space. Now, the followngs are popular kernel functons. 1. Polynomal kernels K(x, x j ) = [(x, x j ) + c] d where d s the degree of the polynomal and c s a constant. 2. Radal bass functon kernel K(x, x j ) = e x x j /2γ 2 Where γ s a parameter. 3. Sgmod kernel K(x, x j ) = tanh[α(x, x j ) + β] where α and β are parameters. 34
41 CHAPTER 4 SOLVING SUPPORT VECTOR MACHINES USING THE NEW ALGORITHM 4.1. Solvng SVM Problems Usng the New Algorthm Ths chapter wll show that Algorthm 2.1 can be used to solve SVM problems. Secton 4.1 wll show that solvng SVM s equvalent to fndng the mnmum dstance of two convex hulls. Fndng the maxmal margn of tranng data sets and fndng the mnmum dstance of two convex hulls gven by the tranng data sets seems to be dfferent processes. Yet, after reformulatng SVM-NV, SVM-NV can be recognzed as a NPP. Let {x } m =1 be a tranng data set, Let I be the ndex set for the frst class and J be the ndex set for the second class. Also, set U = {u u = β x, β = 1, β 0, I} and V = {v v = β j x j, β j = 1, β j 0, j J} Reformulaton of SVM-NV was done by Keerth[2]. SVM Dual s equvalent to (86) mn 1 β s β t t s t t (x s, x t ) 2 s t s.t β s 0, β = 1, β j = 1 I j J Now consder an NPP problem. In an NPP problem, u v 2 needs to be mnmzed. u v 2 = β x j β j x j 2 = ( β x j β j x j, β x j β j x j ) = ( β x, β x ) ( β x, j β j x j ) ( j β j x j, β x ) + ( j β j x j, j β j x j ) 35
42 (87) = β β (x, x ) β β j (x, x j ) j β β j (x, x j ) + j j β j β j (x j, x j ). Note that n (87), f the nner product of two vectors s from the same ndex set, then the sgn of the nner product s postve. Otherwse, t s negatve. (88) Thus, (87) can be wrtten as the followng: β s β t t s t t (x s, x t ) s t where t s t t = 1 f x s and x t are from the same ndex set t s t t = 1 The mnmum dstance between two convex hulls s nonzero f and only f two convex hulls do not ntersect. That s, f two convex hulls ntersect, then the mnmum dstance of two convex hulls s zero. As t s shown earler n chapter 3, real SVM problems have some volatons. If some x volates, then two convex hulls could ntersect. However, havng zeros for a mnmum dstance for two convex hulls s not desred. Here, Fress recommends usng a sum of squared volatons n the cost functon. mn 1 2 w 2 + C 2 k s 2 k s.t wx + b > 1 s j wx + b < 1 + s Ths problem s called SVM Sum of Squared Volaton (SVM-VQ). SVM-VQ can be converted to (89) by smple transformaton. To see ths, let s set and ( ( ) w w = ); b = b; x x = Cs 1, I and x j = e C ( xj ) 1, j J e j C where e s the m dmensonal vector n whch the th component s 1 and all are 0. Then 1 2 w 2 = 1 2 w 2 + C 2 w x + b = wx + b + s k s 2 k w x j + b = wx j + b s 36
43 Thus, SVM-VQ s equvalent to (89) mn 1 2 w 2 s.t w x + b > 1 w x j + b < 1 Here, feasble space of SVM-VQ s non-empty because of slack varable s, and the result of (89) s automatcally feasble. By settng, ( ) x x = 1, I and x j = e C ( xj ) 1, j J. the new algorthm can be used as usual. e j C 4.2. Stoppng Method for the Algorthm Ths secton wll show the stoppng method for NPP usng Algorthm 2.1. Ths secton wll use the KKT condton of optmzaton theory. Frst, consder the general Quadratc Programmng Problem. (Quadratc Programmng Problem) mn 1 2 xt Qx + cx + d s.t Ax b = 0 x 0. KKT for the Quadratc programmng s the followng L(x, λ, µ) = 1 2 xt Qx + cx + d λ (Ax b) µ x = 0 1. L(x,λ,µ) x = Qx + c λ t A µ = 0 2. λ (Ax b) = 0, µ t x = 0 3. λ 0, µ 0 By KKT condton, x 0 then (Qx + c λ t A) = 0. Ths means f nonzero ndex s known, then (Qx + c λ t A) = 0. Usng 2, the lnear system can be solved to solve the Quadratc programmng problem. 37
44 By (88), the NPP problem s equvalent to the followng Quadratc programmng problem. mn 1 2 at Aa where A = Xt X X t Y (NPP-QP) s.t = 2, a 0 Y t X Y t Y =1a Consder lagrangan for NPP-QP L(a, λ, u) = 1 2 at Aa λ(a t e 2) µ t a Then by KKT condton, 1. Aa λe µ = 0 2. λ(a t e 2) = 0 3. µ t a = 0 4. By 1, µ = Aa λe 5. By 3, f a 0, (Aa λe) = 0 6. By 3, f a = 0, (Aa λe) 0 Then by the above, f the nonzero ndces for a s known, only lnear system 2 and 5 need to be solved. Fnally, by usng theorem 4 n chapter 2, t s noted that ndces for u k and v k are not changed for large k. From ths, t can be assumed that u k and v k are on G and G respectvely. Once u k and v k are on G on G respectvely, nonzero ndces for a can be determned n the NPP-QP. After solvng lnear system 2 and 5, condton 6 must be checked to determne f the zero ndces are satsfed. 38
45 CHAPTER 5 EXPERIMENTS 5.1. Comparson wth Lnearly Separable Sets Ths secton wll compare performance between Algorthm 2.1 and sequental mnmal optmzaton(smo) usng lnearly separable sets. The MATLAB program was used to compare the performance. Frst, performance of Algorthm 2.1 and the MDM algorthm are compared n dmenson 100. By ncreasng the number of ponts, the results of experments can be seen n Fgure 5.1. Second, twenty non-ntersected pars of convex hulls n dmensons 100, 200, 300, 500, 700, 1000, 1200, 2000, and 2500 are randomly generated. Then the mnmum dstances between two convex hulls s computed. In each dmenson, the number of vectors s changed to see the performance of two algorthms. The followng graphs show CPU tmes for Algorthm 2.1 and SMO n lnear kernel wth coeffcent 0, lnear kernel wth coeffcent 1, polynomal kernel wth degree 3, and polynomal kernel wth degree 5. In the graphs, lnear kernel wth coeffcent 0 was denoted by Ln(0), lnear kernel wth coeffcent 1 was denoted by Ln(1), polynomal kernel wth degree 3 was denoted by P3, and polynomal kernel wth degree 5 was denoted by P5. Also n the graphs, for example the expresson ndcates that 7000 vectors are n dmenson 100. Fgures 5.2 to 5.12 llustrate the results for the experments n secton Comparson wth Real World Problems In ths secton, the performance between algorthm 2.1 and sequental mnmal optmzaton (SMO) s compared wth real world problems. Four kernel functons are used to compare the performance of Algorthm 2.1 and SMO. Those kernels are polynomal kernel wth degree 3, polynomal kernel wth degree 5, radcal bass wth gamma 1 and radcal bass wth gamma 10. Polynomal kernel wth degree 3, polynomal kernel wth degree 5, radcal bass 39
46 2.5 2 Graph of MDM Graph of Our Alg CPU tmes The number of ponts Fgure 5.1. Algorthm 2.1 and the MDM Algorthm wth gamma 1 and radcal bass wth gamma 10 are denoted d3,d5, g1 and g10 respectvely. Fgures 5.13 to 5.16 llustrate the results for the experments n secton
47 X Graph SMO Graph of Our Alg 0.04 CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
48 X Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
49 X500 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
50 2 1000X1000 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
51 X1200 Graph SMO Graph of Our Alg 2 CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
52 9 2000X2000 Graph SMO Graph of Our Alg 8 7 CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
53 X2500 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
54 Graph SMO Graph of Our Alg 7000X CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
55 X300 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
56 X500 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
57 X700 Graph SMO Graph of Our Alg CPU seconds Ln(0) Ln(1) P3 P5 Kernel Fgure Ponts n Dmenson
58 4 3.5 Heart(270X13) Graph SMO Graph of Our Alg CPU seconds P3 P5 g1 g10 Kernel Fgure Heart 52
59 2.5 Breast cancer(263x9) Graph SMO Graph of Our Alg 2 CPU seconds P3 P5 g1 g10 Kernel Fgure Breast Cancer 53
60 Ttanc(24X3) Graph SMO Graph of Our Alg CPU seconds P3 P5 g1 g10 Kernel Fgure Ttanc 54
61 2.5 Thyrod(215X5) Graph SMO Graph of Our Alg 2 CPU seconds P3 P5 g1 g10 Kernel Fgure Thyrod 55
62 CHAPTER 6 CONCLUSION AND FUTURE WORKS 6.1. Concluson As shown prevously n secton 1 of chapter 5, the performance of SMO s better than Algorthm 2.1 wth , , and The performance wth or wth hgher dmensons, Algorthm 2.1 works better than SMO. Algorthm 2.1 performs better for hgher dmensons. To show ths, fx the number of ponts and ncrease the dmenson from dmenson 100 to 300. In the dmensons of both 100 and 300 wth 7000 ponts respectvely, SMO performs faster than Algorthm 2.1. However, wth data set randomly generated by the MATLAB, CPU tme for Algorthm s shorter than SMO n the dmensons of both 500 and 700 wth 7000 ponts respectvely. Therefore, n randomly generated lnearly separable sets wth a hgher dmenson, Algorthm 2.1 performs better than SMO. Although many applcaton problems have a small number of support vectors, data sets that are generated by the MATLAB usually have a small number of support vectors. That s, f the data sets have a large number of support vectors, then Algorthm 2.1 may perform slower than SMO. Unfortunately, most of the real world data sets are n a low dmenson format. Snce the performance of SMO s better than Algorthm 2.1 wth lnearly separable sets, SMO performed better wth those data sets. But as shown n secton 2 n chapter 5, Algorthm 2.1 s stll comparable wth SMO. By the theorem 4 n chapter 2 and stoppng method for the algorthm n secton 2 n chapter 4, Algorthm 2.1 can fnd the soluton for the NPP n a fnte number of teratons Future Work In Algorthm 2.1, two coeffcents of u k n representaton at each teraton are updated, but note that, n fact, four or more coeffcents of u k can be updated. Instead of usng 56
63 u k+1 = u k + α k (x k x k ), u k+1 = u k + t k α k (x mn,av x max,av ) can be used. In the new update, 0 t k 1, α k = mn{α k, α 2, k}, x mn,av = x k +x 2, k, x 2 max,av = x k +x 2, k 2 where (x k, u k v k ) = mn(x, u k v k ), (x 2, k, u k v k ) = mn (x, u k v k ), (x k, u k v k ) = max (x, u k v k ) and (x 2, a >0 k, u k v k ) = max (x, u k v k ). Wth ths expresson, four a >0 anda α coeffcents n u k+1 are updated. The precse method of updatng four coeffcents at a tme wll be shown n Appendx B. When Algorthm 2.1 and the mproved verson of Algorthm 2.1 are compared, the number of teratons for the mproved verson of Algorthm 2.1 s always smaller than the one wth Algorthm 2.1. By usng the above method, CPU tme for Algorthm 2.1 s mproved. Moreover, mproved Algorthm 2.1 could perform better than SMO n lower dmensons. 57
64 u av. Appendx A Improved MDM. Defne u 2 = max α j >0,α j α (x, u) and u 2 = mn (x j, u) x j u and u av = u +u 2 and u 2 av = u+u 2 2 Ths secton wll show the mproved MDM algorthm by usng the average ponts u av and Improved MDM Step 1 Choose u k U and fnd u k and u k Step 2 Fnd u 2k and u 2k. If u 2k does not exst, then use the MDM. If (u 2k, u 2k ) (u k, u k ), then use the MDM. If (u 2k, u 2k ) > (u k, u k ), then use the MDM. If δ(u k ) > k (u k ) = (u avk u avk, u k ), then use the MDM. Step 3 Otherwse, set u k (t) = u k + tα k (u avk u avk ) where α = mn{α, α 2} and t [0, 1] α = mn{α, α 2} s used snce each coeffcent of α need to be greater than or equal to 0 n order to be n the convex hull. Step 4 Set u k+1 = u k + t k α k (u avk u avk ) where (u k(t k ), u k (t k )) = mn (u k (t), u k (t)) t [0,1] Step 5 Iterate untl k (u k ) = (u avk u avk, u k ) 0. In the representaton u k = m α k x, the mproved MDM s updatng 4 coeffcent of the =1 representaton meanwhle the MDM s updatng 2 coeffcent of the representaton. Ths makes the teraton faster. However, updatng two many coeffcent could make the teraton slow around the soluton u. By Step 4 u k+1 u k s obtaned. The proof of the convergence Assumpton 1 (1) u 2k exst (2) (u 2k, u 2k ) > (u k, u k ) (3) (u 2k, u 2k ) (u k, u k ) (4) δ(u k ) k (u k ) 58
65 Defne δ(u) = (u, u) (u av, u). Then clearly δ(u) 0.By Corollay 2 to Lemma 2 n [1], If {u k }, u k U, k = 0, 1, 2...s a sequence of ponts such that u k+1 u k and there s a subsequence { u kj } such that δ(ukj ) u = m =1 0 then u k u. Also, by the Assumpton 1 for any kj α x U, δ (u) (u). Lemma A.1 lm k α k k (u k ) = 0 where α k = mn{α, α 2} (proof) Let u k (t) = u k + 2tα k (u avk u k ). Then (u k (t), u k (t)) = (u k, u k ) 4tα k k (u k ) + (4α 2 )t 2 u avk u av 2. Suppose lm k α k k (u k ) 0. then there s a subsequnce u kj that α kj kj ɛ > 0. Then ( u kj (t), u kj (t) ) (u kj, u kj ) 4tɛ + 16t 2 d 2 where d = max x l x p. (u kj, u kj ) 4tɛ + 16t 2 d 2 s mnmzed at t = mn{ ε d 2, 1} If t = 1, then ε > d 2 t snce ε d 2 > t. Then by (u k (t), u k (t)) = (u k, u k ) 4tα k k (u k ) + (4α 2 )t 2 u avk u av 2 ( ukj (t), u kj (t) ) (u kj, u kj ) 4ε + 4ε = (u kj, u kj ) Ths s a contradcton. If t = ε d 2 then ( u kj (t), u kj (t) ) (u kj, u kj ) 4 ε d 2 ε + 4( ε d 2 ) 2 d 2 = (u kj, u kj ) Ths s a contradcton. Thus lm k α k k (u k ) = 0 A.2 lm k (u k ) = 0 k (proof) Suppose lm k (u k ) = # (u # ) > 0 k Then for large k > K, k (u k ) # (u # ) 2. By Lemma A.1 α k 0. Assume (u k (t), u k (t)) s mnmzed at t k = Hence, for large k > K K 1 t k = 1, thus u k+1 = u k + 2α k (u avk u av) Let { u kj } be subsequence such that kj (u kj ) # (u # ) k (u k ) 2α k u avk u av 2. In the sequence { u kj }, some terms can be omtted f necessary. The sequence that satsfes the followng 3 condtons hold can be found. (1) α,kj α # 59
Kernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationChapter 6 Support vector machine. Séparateurs à vaste marge
Chapter 6 Support vector machne Séparateurs à vaste marge Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationAffine transformations and convexity
Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationLinear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.
Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationSupport Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012
Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationImage classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?
Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of
More informationCalculation of time complexity (3%)
Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationKristin P. Bennett. Rensselaer Polytechnic Institute
Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal
More informationON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION
Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationComplete subgraphs in multipartite graphs
Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationNonlinear Classifiers II
Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationOn the correction of the h-index for career length
1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More information8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars
More informationTHE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens
THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationFormulas for the Determinant
page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use
More informationA New Refinement of Jacobi Method for Solution of Linear System Equations AX=b
Int J Contemp Math Scences, Vol 3, 28, no 17, 819-827 A New Refnement of Jacob Method for Soluton of Lnear System Equatons AX=b F Naem Dafchah Department of Mathematcs, Faculty of Scences Unversty of Gulan,
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationA 2D Bounded Linear Program (H,c) 2D Linear Programming
A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationA Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function
A Local Varatonal Problem of Second Order for a Class of Optmal Control Problems wth Nonsmooth Objectve Functon Alexander P. Afanasev Insttute for Informaton Transmsson Problems, Russan Academy of Scences,
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationSalmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to
More informationA new Approach for Solving Linear Ordinary Differential Equations
, ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of
More informationMin Cut, Fast Cut, Polynomial Identities
Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.
More informationAdvanced Introduction to Machine Learning
Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1
More informationLecture 5 Decoding Binary BCH Codes
Lecture 5 Decodng Bnary BCH Codes In ths class, we wll ntroduce dfferent methods for decodng BCH codes 51 Decodng the [15, 7, 5] 2 -BCH Code Consder the [15, 7, 5] 2 -code C we ntroduced n the last lecture
More informationInteractive Bi-Level Multi-Objective Integer. Non-linear Programming Problem
Appled Mathematcal Scences Vol 5 0 no 65 3 33 Interactve B-Level Mult-Objectve Integer Non-lnear Programmng Problem O E Emam Department of Informaton Systems aculty of Computer Scence and nformaton Helwan
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More information