1 The Primal and Dual of an Optimization Problem

Size: px

Start display at page:

Download "1 The Primal and Dual of an Optimization Problem"

Ross Edwards
6 years ago
Views:

1 CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio boudary. Recall the setup of soft-argi SVMs: y i s: ±1, represetig positive or egative class x i s: feature vectors i R d ξ i s: slack variables represetig how uch a x i is allowed to violate the argi C: a hyperparaeter describig how severely we pealize slack The optiizatio proble for w R d ad t R, the paraeters of the SVM: 1 i w,t,ξ i 2 w 2 +C ξ i s.t. y i (w T x i +t) 1 ξ i i ξ i 0 Now, we will ivestigate the dual of this proble, which will otivate our discussio of kerels. Before we do so, we first have to uderstad the prial ad dual of a optiizatio proble. i 1 The Prial ad Dual of a Optiizatio Proble All optiizatio probles ad be expressed i the stadard for i x f 0 (x) s.t. f i (x) 0 i = 1,..., h j (x) = 0 j = 1,..., (1) For the purposes of our discussio, assue that x R d. The ai copoets of a optiizatio proble are: The objective fuctio f 0 (x) The iequality costraits: expressios ivolvig f i (x) The equality costraits: expressios ivolvig h j (x) CS 189, Fall 2017, Note 18 1

2 Workig with the costraits ca be cubersoe ad challegig to aipulate, ad it would be ideal if we could soehow tur this costraied optiizatio proble ito a ucostraied oe. Oe idea is to re-express the optiizatio proble ito i x L (x) (2) where L (x) = { f 0 (x) if f i (x) 0, i [1,] ad h j (x) = 0, j [1,] otherwise Note that the ucostraied optiizatio proble above is equivalet to the origial costraied proble. Eve though the ucostraied proble cosiders values that violate the costraits (ad therefore are ot i the feasible set for the costraied optiizatio proble), it will effectively igore the because they are treated as i a iiizatio proble. Eve though we are ow dealig with a ucostraied proble, it still is difficult to solve the optiizatio proble, because we still have to deal with all of the casework i the objective fuctio L (x). I order to solve this issue, with have to itroduce dual variables, specifically oe set of dual variables for the equality costraits, ad oe set for the iequality costraits. First, let s deal with the equality costraits. If we oly take ito accout the dual variables for the equality costraits, the optiizatio proble ow becoes i x axl (x,ν) (3) ν where L (x,ν) = { f 0 (x) + ν jh j (x) if f i (x) 0, i [1,] otherwise We are still workig with a ucostraied optiizatio proble, except that ow, we are optiizig over two sets of variables: the prial variables x R d ad the dual variables ν R. Also ote that the optiizatio proble has ow becoe a ested oe, a ier optiizatio proble the axiizes over the dual variables, ad a outer optiizatio proble that iiizes over the prial variables. Let s exaie why this optiizatio proble is equivalet to the origial costraied optiizatio proble: Ay x that violates the iequality costraits is still treated as by the outer iiizatio proble over x ad therefore igored For ay x that violates the equality costraits (eaig that j s.t. h j (x) 0), the ier axiizatio proble over ν ca choose ν j as if h j (x) > 0 (or ν j as if h j (x) < 0) to cause the ier axiizatio blow off to, therefore beig igored by the outer iiizatio over x For ay x that does ot violate ay of the equality or iequality costraits, the ier axiizatio proble over ν is siply equal to f 0 (x) CS 189, Fall 2017, Note 18 2

3 This solutio coes at a cost i a effort to reove the equality costraits, we had to add i dual variables, oe for each iequality costrait. With this i id, let s try to do the sae for the iequality costraits. Addig i dual variable λ i to represet each iequality costrait, we ow have iax x λ,ν L (x,λ,ν) = f 0 (x) + s.t. λ i 0 i = 1,..., λ i f i (x) + ν j h j (x) For coveiece, we ca place the costraits ivolvig λ ito the optiizatio variable. i x ax L (x,λ,ν) = f 0(x) + λ 0,ν λ i f i (x) + ν j h j (x) This optiizatio proble above is otherwise kow as the prial, ad its optial value is ideed equivalet to that of the origial costraied optiizatio proble. p = i x We ca verify that this is ideed the case: ax L (x,λ,ν) λ 0,ν For ay x that violates the iequality costraits (eaig that i [1,] s.t. f i (x) > 0), the ier axiizatio proble over λ ca choose λ i as to cause the ier axiizatio blow off to, therefore beig igored by the outer iiizatio over x For ay x that violates the equality costraits (eaig that j s.t. h j (x) 0), the ier axiizatio proble over ν ca choose ν j as if h j (x) > 0 (or ν j as if h j (x) < 0) to cause the ier axiizatio blow off to, therefore beig igored by the outer iiizatio over x For ay x that does ot violate ay of the equality or iequality costraits, i the ier axiizatio proble over ν, the expressio ν jh j (x) evaluates to 0 o atter what the value of ν is, ad i the ier axiizatio proble over λ, the expressio λ i f i (x) ca at axiu be 0, because λ i is costraied to be o-egative, ad f i (x) is o-positive. Therefore, at best, the axiizatio proble sets λ i f i (x) = 0, ad ax L (x,λ,ν) = f 0(x) λ 0,ν I its full for, the objective L (x,λ,ν) is called the Lagragia, ad it takes ito accout the ucostraied set of prial variables x R d, the costraied set of dual variables λ R correspodig to the iequality costraits, ad the ucostraied set of dual variables ν R correspodig to the equality costraits. Note that our dual variables λ i are i fact costraied, so ultiately we were ot able to tur the origial optiizatio proble ito a ucostraied oe, but our costraits are uch sipler tha before. (4) CS 189, Fall 2017, Note 18 3

4 The dual of this optiizatio proble is still over the sae optiizatio objective, except that ow we swap the order of the axiizatio of the dual variables ad the iiizatio of the prial variables. d = ax i L (x,λ,ν) = ax g(λ,ν) λ 0,ν x λ 0,ν The dual is effectively a axiizatio proble (over the dual variables) of g(λ,ν) = i x L (x,λ,ν). 2 Strog Duality ad KKT Coditios It is always true that the solutio to the prial proble is at least as large as the solutio to the dual proble: p d (5) This coditio is kow as weak duality. Proof. We kow that x,λ 0,ν ax L (x, λ, ν) L (x,λ,ν) il (,λ,ν) More copactly, x,λ 0,ν ax L (x, λ, ν) il (,λ,ν) Sice this is true for all x,λ 0,ν this is true i particular whe we set x = argi ax L (, λ, ν) ad λ,ν = argax il (, λ, ν) We therefore kow that p = i ax L (, λ, ν) ax il (, λ, ν) = d The differece p d is kow as the duality gap. I the case of strog duality, the duality gap is 0. That is, we ca swap the order of the iiizatio ad axiizatio ad up with the sae optial value: p = d (6) There are several useful theores detailig the existece of strog duality, such as Slater s theore, which states that if the prial proble is covex, ad there exists a x that ca strictly eet the iequality costraits ad eet the equality costraits, the strog duality holds. Give that strog duality holds, the Karush-Kuh-Tucker (KKT) coditios ca help us fid the solutios to the dual variables of the optiizatio proble. The KKT coditios are coposed of: CS 189, Fall 2017, Note 18 4

5 1. Prial feasibility (iequalities) f i (x) 0, i [1,] 2. Prial feasibility (equalities) 3. Dual feasibility 4. Copleetary Slackess h j (x) = 0, j [1,] λ i 0, i [1,] λ i f i (x) = 0, i [1,] 5. Statioarity x f 0 (x) + λ i x f i (x) + ν j x h j (x) = 0 Let s see how the KKT coditios relate to strog duality. Theore 2.1. If x ad λ,ν are the prial ad dual solutios respectively, with zero duality gap (i.e. strog duality holds), the x,λ,ν also satisfy the KKT coditios. Proof. KKT coditios 1, 2, 3 are trivially true, because the prial solutio x ust satisfy the prial costraits, ad the dual solutio λ,ν ust satisfy the dual costraits. Now, let s prove coditios 4 ad 5. We kow that sice strog duality holds, we ca say that p = f 0 (x ) = g(λ,ν ) = d = il (x,λ,ν ) x L (x,λ,ν ) = f 0 (x ) + = f 0 (x ) + f 0 (x ) λ i f i (x ) + λ i f i (x ) ν j h j (x ) I the fourth step, we ca cacel the ters ivolvig h j (x ) because we kow that the prial solutio ust satisfy h j (x ) = 0. I the fifth step, we kow that λi f i (x ) 0, because λi 0 i order to satisfy the dual costraits, ad f i (x ) 0 i order to satisfy the prial costraits. Sice we established that f 0 (x ) = i x L (x,λ,ν ) L (x,λ,ν ) f 0 (x ), we kow that all of the iequalities hold with iequality ad therefore L (x,λ,ν ) = i x L (x,λ,ν ). This iplies KKT coditio 5 (statioarity), that x f 0 (x ) + λi x f i (x ) + ν j xh j (x ) = 0 CS 189, Fall 2017, Note 18 5

6 Fially, ote that due to the equality f 0 (x )+ λ i f i (x ) = f 0 (x ), we kow that λ i f i (x ) = 0. This cobied with the fact that i λi f i (x ) 0, establishes KKT coditio 4 (copleetary slackess): λi f i (x ) = 0, i [1,] The theore above establishes that i the presece of strog duality, if the solutios are optial, the they satisfy the KKT coditios. A stateet that is alost, but ot quite the coverse, is also true. Theore 2.2. If the prial proble is covex, ad if x ad λ,ν satisfy the KKT coditios, the they are the optial solutios to the prial ad dual probles, respectively. Proof. If x ad λ,ν satisfy KKT coditios 1, 2, 3 we kow that they are at least feasible for the prial ad dual proble. Fro the KKT statioarity coditio we kow that x f 0 (x ) + λi x f i (x ) + ν j xh j (x ) = 0 Sice the prial proble is covex, we kow that L (x,λ,ν) is covex i x, ad if the gradiet of L (x,λ,ν ) at x is 0, we kow that x = i x L (x,λ,ν ) Therefore, we kow that the optial prial values for the prial proble optiize the ier optiizatio proble of the dual proble, ad g(λ,ν ) = f 0 (x ) + λi f i (x ) + ν j h j (x ) By the prial feasibility coditios for h j (x) ad the copleetary slackess coditio, we kow that g(λ,ν ) = f 0 (x ) Now, all we have to do is to prove that x ad λ,ν are prial ad dual optial, respectively. Note that sice weak duality always holds, we kow that p d = ax λ 0,ν g(λ,ν) g( λ, ν), Sice we kow that p g(λ,ν), we ca also say that f 0 (x) p f 0 (x) g(λ,ν) Ad if we have that f 0 (x ) = g(λ,ν ) as we deduced earlier, the λ 0, ν f 0 (x ) p f 0 (x ) g(λ,ν ) = 0 = p = f 0 (x ) = g(λ,ν ) = d Therefore, we have prove that x ad λ,ν are prial ad dual optial, respectively. Eve though we did ot iitially assue that strog duality holds, we evetually arrived at the coclusio that strog duality does ideed hold. CS 189, Fall 2017, Note 18 6

7 Let s pause for a secod to uderstad what we ve foud so far. Give a optiizatio proble, its prial proble is a optiizatio proble over the prial variables, ad its dual proble is a optiizatio proble over the dual variables. If we kow that strog duality holds, the we ca either solve the prial or dual proble ad ed up with the sae optial value. Whe solvig for the dual variables, if we kow that the prial proble is covex i the prial variables, the we use the KKT coditios to fid the optial dual variables. We shall do just that, i our discussio of SVMs. 3 The Dual of SVMs Now, let s apply our kowledge of duality to fid the dual of the soft-argi SVM optiizatio proble. i f (w,t,ξ ) {}}{ 1 2 w 2 +C ξ i i w,t,ξ (1 ξ i ) y i (w T x i t) 0 ad ξ i 0 }{{} g(w,t,ξ ) 0 Let s idetify the prial ad dual variables for the SVM proble. We will have Prial variables w, t, ad ξ i Dual variables α i correspodig to each costrait of the for y i (w T x i t) 1 ξ i Dual variables β i correspodig to each costrait of the for ξ i 0 For the purposes of otatio, ote that we are usig α ad β i place of λ, ad there are dual variables correspodig to ν because there are o equality costraits. The lagragia for the SVM proble is: L (w,t,ξ,α,β) = 1 2 w 2 +C Thus, the dual is: = 1 2 w 2 1 ax g(α,β) = i α,β 0 w,t,ξ 2 w 2 ξ i + α i y i (w T x i t) + α i ((1 ξ i ) y i (w T x i t)) + α i y i (w T x i t) + α i + α i + β i ( ξ i ) (C α i β i )ξ i (7) (C α i β i )ξ i (8) Let s use the KKT coditios to fid the optial dual variables. Verify that the prial proble is covex i the prial variables. We kow that fro the statioarity coditios, evaluated at the optial dual values α ad β, ad the optial prial values w,t,ξ i : L w i = L t = L ξ i = 0 CS 189, Fall 2017, Note 18 7

8 w L = w α i y ix i = 0 = w = α i y ix i. This tells us that w is goig to be a weighted cobiatio of the positive-class x i s ad egative-class x i s. L t = α i y i = 0. This tells us that the weights α i will be equally distributed aog positive- ad egative- class traiig poits. = C αi βi = 0 = 0 αi C. This tells us that the weights αi are restricted to beig less tha the hyperparaeter C. L ξ i Verify that the other KKT also hold. Usig these observatios, we ca eliiate soe ters of the dual proble. L (w,t,ξ,α,β ) = 1 2 w 2 = 1 2 w 2 = 1 2 w 2 α i y i (w T x i t) + α i y i (w T x i ) +t α i y i (w T x i ) + α i + αi y i + }{{} =0 αi (C α i β α i + i )ξ i (C α i β i )ξ i } {{ } =0 We ca also rewrite all the optial prial variables w,t,ξ i ters of the optial dual variables α i : g(α,β ) = i w,t,ξ L (w,t,ξ,α,β ) = L (w,t,ξ,α,β ) = 1 2 α i y i x i 2 = 1 2 α i y i x i 2 = α T α T Qα α i y i (( (α i y i x T i ( where Q i j = y i (x T i x j)y j (ad Q = (diag y)xx T (diag y)). α j y j x j ) T x i ) + α j y j x j )) + αi αi Now, we ca write the fial for of the dual, which is oly i ters of α (ad x ad y): ax α αt αt Qα α i y i = 0 0 α i C CS 189, Fall 2017, Note 18 8

9 ax α s.t. α T αt Qα α i y i = 0 0 α i C i = 1,..., (9) 3.1 Geoetric ituitio We ve see that the optial value of the dual proble i ters of α is equivalet to the optial value of the prial proble i ters of w, t, ad ξ. But what do these dual values α i eve ea? That s a good questio! Recall the followig KKT coditios that are eforced: Statioarity Copleetary slackess C α i β i = 0 α i ((1 ξ i ) y i (w T x i t )) = 0 β i ξ i = 0 Here are soe oteworthy relatioships betwee α i ad the properties of the SVMs: Case 1: αi = 0. I this case, we kow βi = C, which is ozero, ad therefore ξi = 0. That is, if for poit i we have that αi = 0 by the dual proble, the we kow that there is o slack give to this poit. Lookig at the other copleetary slackess coditio, this akes sese because if αi = 0, the y i (w T x i t ) (1 ξi ) ay be ay value, ad if we re iiizig the su of our ξ i s, we should have ξi = 0. Case 2: α i is ozero. If this is the case, the we kow β i = C α i 0 Case 2.1: αi = C. If this is the case, the we kow βi = 0, ad therefore ξi ay be exactly 0 or ozero. Case 2.2: αi lies betwee 0 ad C. I this case, the βi is ozero ad ξi = 0. But this is differet fro Case 1 because with αi ozero, we ca divide by αi i the copleetary slackess coditio ad arrive at the fact that y i (w T x i t) 1 = 0 = y i (w T x i ) = t, which eas x i lies exactly o the argi deteried by w ad t. Lastly, let s recostruct the optial prial values w,t,ξi fro the optial dual values α : w = α i y i x i t = ea(w T x i i : 0 < αi < C) { ξi 1 y i (w T x i t ) if αi = C, = 0 otherwise (10) CS 189, Fall 2017, Note 18 9

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate