14 Lagrange Multipliers

Size: px

Start display at page:

Download "14 Lagrange Multipliers"

Suzan Davis
6 years ago
Views:

1 Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve physcs equatons), t s used for several ey dervatons n machne learnng. The problem set-up s as follows: we wsh to fnd extrema.e., maxma or mnma) of a dfferentable objectve functon Ex) = Ex 1,x 2,...x D ) 1) If we have no constrants on the problem, then the extrema must necessarly satsfy the followng system of equatons: E = 0 2) whch s equvalent to wrtng de dx = 0 for all. Ths equaton says that there s no way to nfntesmally perturb x to get a dfferent value fore; the objectve functon s locally flat. ow, however, our goal wll be to fnd extrema subject to a sngle constrant: gx) = 0 3) In other words, we want to fnd the extrema among the set of ponts x that satsfy gx) = 0. It s sometmes possble to reparameterze the problem n order to elmnate the constrants.e., so that the new parameterzaton ncludes all possble solutons to gx) = 0), however, ths can be awward n some cases, and mpossble n others. Gven the constrant gx) = 0, we are no longer loong for a pont where no perturbaton n any drecton changes E. Instead, we need to fnd a pont at whch perturbatons that satsfy the constrants do not change E. Ths can be expressed by the followng condton: E +λ g = 0 4) for some arbtrary scalar value λ. Frst note that, for ponts on the contour gx) = 0, the gradent g s always perpendcular to the contour ths s a great exercse f you don t remember the proof). Hence the expresson E = λ g says that the gradent of E must be parallel to the gradent of the contour at a possble soluton pont. In other words, any perturbaton to x that changes E also maes the constrant become volated. Perturbatons that do not change g, and hence stll le on the contourgx) = 0 do not changee ether. Hence, our goal s to fnd a pontxthat satsfes ths condton and alsogx) = 0 In the Method of Lagrange Multplers, we defne a new objectve functon, called the Lagrangan: Lx,λ) = Ex)+λgx) 5) ow we wll nstead fnd the extrema of L wth respect to both x and λ. The ey fact s that extrema of the unconstraned objectve L are the extrema of the orgnal constraned problem. So we have elmnated the nasty constrants by changng the objectve functon and also ntroducng new unnowns. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 83

2 Lagrange Multplers E x g gx) = 0 Fgure 1: The set of solutons togx) = 0 vsualzed as a curve. The gradent g s always normal to the curve. At an extremal pont, E ponts s parallel to g. Fgure from Pattern Recognton and Machne Learnng by Chrs Bshop.) To see why, let s loo at the extrema of L. The extrema toloccur when dλ dx = gx) = 0 6) = E +λ g = 0 7) whch are exactly the condtons gven above. In other words, frst equaton ensures that gx) s zero, as desred, and the second equaton s our constrant that the gradents of E and g mucst be parallel. Usng the Lagrangan s a convenent way of combnng these two constrants nto one unconstraned optmzaton Examples Mnmzng on a crcle. We begn wth a smple geometrc example. We have the followng constraned optmzaton problem: argmn x,y x+y 8) subject tox 2 +y 2 = 1 9) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 84

Lagrange Multplers Fgure 2: Illustraton of the maxmzaton on a crcle problem. Image from Wpeda.) In other words, we want to fnd the pont on a crcle that mnmzesx+y; the problem s vsualzed n Fgure 2.

3 Lagrange Multplers Fgure 2: Illustraton of the maxmzaton on a crcle problem. Image from Wpeda.) In other words, we want to fnd the pont on a crcle that mnmzesx+y; the problem s vsualzed n Fgure 2. Here,Ex,y) = x+y andgx,y) = x 2 +y 2 1. The Lagrangan for ths problem s: Settng the gradent to zero gves ths system of equatons: Lx,y,λ) = x+y +λx 2 +y 2 1) 10) = 1+2λx = 0 dx 11) = 1+2λy = 0 dy 12) dλ = x2 +y 2 1 = 0 13) From the frst two lnes, we can see that x = y. Substtutng ths nto the constrant and solvng gves two solutons x = y = ± 1 2. Substtutng these two solutons nto the objectve, we see that the mnmum s atx = y = 1 2. Estmatng a multnomal dstrbuton. In a multnomal dstrbuton, we have an event e wth K possble dscrete, dsjont outcomes, where Pe = ) = p 14) For example, con-flppng s a bnomal dstrbuton where = 2 and e = 1 mght ndcate that the con lands heads. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 85

4 Lagrange Multplers Suppose we observe events; the lelhood of the data s: K =1Pe p) = p 15) where s the number of tmes that e =,.e., the number of occurrences of the -th event. To estmate ths dstrbuton, we can mnmze the negatve log-lelhood: arg mn lnp 16) subject to p = 1,p 0, for all 17) The constrants are requred n order to ensure that the p s form a vald probablty dstrbuton. One way to optmze ths problem s to reparameterze: set p K = 1 K 1 =1 p, substtute n, and then optmze the unconstraned problem n closed-form. Whle ths method does wor n ths case, t breas the natural symmetry of the problem, resultng n some messy calculatons. Moreover, ths method often cannot be generalzed to other problems. The Lagrangan for ths problem s: Lp,λ) = ) lnp +λ p 1 18) Here we omt the constrant that p 0 and hope that ths constrant wll be satsfed by the soluton t wll). Settng the gradent to zero gves: Multplyng /dp = 0 byp and summng over gves: = +λ = 0 for all dp p 19) dλ = p 1 = 0 20) 0 = K =1 +λ p = +λ 21) snce = and p = 1. Hence, the optmal λ =. Substtutng ths nto /dp and solvng gves: p = whch s the famlar maxmum-lelhood estmator for a multnomal dstrbuton. 22) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 86

5 Lagrange Multplers Maxmum varance PCA. In the orgnal formulaton of PCA, the goal s to fnd a low-dmensonal projecton of data ponts y x = w T y b) 23) such that the varance of the x s s maxmzed, subject to the constrant that w T w = 1. The Lagrangan s: ) Lw,b,λ) = 1 x 1 2 x +λw T w 1) 24) = 1 w T y b) 1 2 w T y b)) +λw T w 1) 25) = 1 w T y b) 1 2 y b))) +λw T w 1) 26) = 1 w T y ȳ) ) 2 +λw T w 1) 27) = 1 w T y ȳ)y ȳ) T w+λw T w 1) 28) = w T 1 y ȳ)y ȳ) )w+λw T T w 1) 29) where ȳ = y /. Solvng /dw = 0 gves: 1 y ȳ)y ȳ) )w T = λw 30) Ths s just the egenvector equaton: n other words, w must be an egenvector of the sample covarance of the y s, and λ must be the correspondng egenvalue. In order to determne whch one, we can substtute ths equalty nto the Lagrangan to get: L = w T λw+λw T w 1) 31) = λ 32) snce w T w = 1. Snce our goal s to maxmze the varance, we choose the egenvector w whch has the largest egenvalue λ. We have not yet selected b, but t s clear that the value of the objectve functon does not depend on b, so we mght as well set t to be the mean of the data b = y /, whch results n the x s havng zero mean: x / = 0. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 87

6 Lagrange Multplers 14.2 Least-Squares PCA n one-dmenson We now derve PCA for the case of a one-dmensonal projecton, n terms of mnmzng squared error. Specfcally, we are gven a collecton of data vectorsy 1:, and wsh to fnd a basb, a sngle unt vector w, and one-dmensonal coordnates x 1:, to mnmze: arg mn w,x 1:,b y wx +b) 2 33) subject tow T w = 1 34) The vectorws called the frst prncpal component. The Lagrangan s: Lw,x 1:,b,λ) = y wx +b) 2 +λ w 2 1) 35) There are several sets of unnowns, and we derve ther optmal values each n turn. Projectons x ). We frst derve the projectons: dx = 2w T y wx +b)) = 0 36) Usngw T w = 1 and solvng for x gves: x = w T y b) 37) Bas b). We begn by dfferentatng: db = 2 y wx +b)) 38) Substtutng n Equaton 37 gves db = 2 y ww T y b)+b)) 39) = 2 y +2ww T y 2ww T b+2b 40) = 2I ww T ) y +2I ww T )b = 0 41) Dvdng both sdes by 2I ww T ) and rearrangng terms gves: b = 1 y 42) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 88

7 Lagrange Multplers Bass vector w). To mae thngs smpler, we wll defne ỹ = y b) as the mean-subtracted data ponts, and the reconstructons are thenx = w T ỹ, and the objectve functon s: L = = = = ỹ wx 2 +λw T w 1) 43) ỹ ww T ỹ 2 +λw T w 1) 44) ỹ ww T ỹ ) T ỹ ww T ỹ )+λw T w 1) 45) ỹ T ỹ 2ỹ T ww T ỹ +ỹ T ww T ww T ỹ )+λw T w 1) 46) = ỹ T ỹ ỹ T w) 2 +λw T w 1) 47) where we have usedw T w = 1. We then dfferentate and smplfy: We can rearrange ths to get: dw = 2 ỹ ỹ T w+2λw = 0 48) ) ỹ ỹ T w = λw 49) Ths s exactly the egenvector equaton, meanng that extrema for L occur when w s an egenvector of the matrx ỹỹ T, andλs the correspondng egenvalue. Multplyng both sdes by 1/, we see ths matrx has the same egenvectors as the data covarance: 1 y b)y b) )w T = λ w 50) ow we must determne whch egenvector to use. We rewrte Equaton 47 as: L = = and substtute n Equaton 49: ỹ T ỹ w T ỹ ỹ T w+λw T w 1) 51) ) ỹ T ỹ w T ỹ ỹ T w+λw T w 1) 52) 53) L = = ỹ T ỹ λw T w+λw T w 1) 54) ỹ T ỹ λ 55) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 89

8 Lagrange Multplers agan usng w T w = 1. We must pc the egenvalue λ that gves the smallest value of L. Hence, we pc the largest egenvalue, and setwto be the correspondng egenvector Multple constrants When we wsh to optmze wth respect to multple constrants {g x)},.e., Extrema occur when: argmn x Ex) 56) subject to g x) = 0 for = 1...K 57) E + λ g = 0 58) where we have ntroduced K Lagrange multplers λ. The constrants can be combned nto a sngle Lagrangan: Lx,λ 1:K ) = Ex)+ λ g x) 59) 14.4 Inequalty constrants The method can be extended to nequalty constrants of the form gx) 0. For a soluton to be vald and maxmal, there two possble cases: The optmal soluton s nsde the constrant regon, and, hence E = 0 and gx) > 0. In ths regon, the constrant s nactve, meanng thatλcan be set to zero. The optmal soluton les on the boundarygx) = 0. In ths case, the gradent E must pont n the opposte drecton of the gradent of g; otherwse, followng the gradent of E would causeg to become postve whle also modfynge. Hence, we must have E = λ g for λ 0. ote that, n both cases, we have λgx) = 0. Hence, we can enforce that one of these cases s found wth the followng optmzaton problem: max w,λ Ex)+λgx) 60) such that gx) 0 61) λ 0 62) λgx) = 0 63) These are called the Karush-Kuhn-Tucer KKT) condtons, whch generalze the Method of Lagrange Multplers. When mnmzng, we want E to pont n the same drecton as g when on the boundary, and so we mnmze E λg nstead of E +λg. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 90

9 Lagrange Multplers E x 1 g x 2 gx) > 0 gx) = 0 Fgure 3: Illustraton of the condton for nequalty constrants: the soluton may le on the boundary of the constrant regon, or n the nteror. Fgure from Pattern Recognton and Machne Learnng by Chrs Bshop.) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaer 91

15 Lagrange Multipliers

15 Lagrange Multipliers 15 The Method of s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve physcs equatons), t s used for several ey dervatons n