Supplementary material for the paper Optimal Transport for Domain Adaptation

1 Supplemetary materal for the paper Optmal Trasport for Doma Adaptato Ncolas Courty, Member, IEEE, Rem Flamary, Devs Tua, Seor Member, IEEE, Ala Rakotomamojy, Member, IEEE I. OPTIMAL TRANSPORT FOR AFFINE TRANSFORMATIONS Our ma hypothess s that the trasformato T( ) preserves the codtoal dstrbuto P s (y x s ) = P t (y T(x s )) Hece, f we are able to estmate a trasformato T ( ) such that for all x, T (x) = T(x), we ca compute the exact codtoal probablty dstrbuto of ay example x the target doma through P t (y x) = P s (y T 1 (x)) supposg that T s vertble. We ca remark that the ablty of our approach to recover the codtoal probablty of a example essetally depeds o how well the codto T (x) = T(x) x s respected. I the followg, we prove that, for emprcal dstrbutos ad uder some mld codtos (postve defte A), the estmated trasportato map T obtaed by mmzg the l 2 groud metrc does recover exactly the true trasformato T. We wat to show that, f µ t (x) = µ s (Ax+b) where A S + ad b R d, the the estmated optmal trasportato pla T (x) = Ax + b. Ths bols dow to provg the followg theorem. Theorem 1.1: Let µ s ad µ t be two dscrete dstrbutos each composed of dracs as defed Equato (1) the paper. If the followg codtos hold 1) The source samples µ s are x s Rd, 1,..., such that x s xs j f j. 2) All weghts the source ad target dstrbutos are equal to 1. 3) The target samples are defed as x t = Axs + b.e. a affe traformato of the source samples. 4) b R d ad A S + s a strctly postve defte matrx. 5) The cost fucto c(x s, x t ) = x s x t 2. the the soluto T of the optmal trasport problem gves T (x s ) = Axs + b = xt 1,...,. Frst, ote that the OT problem (ad partcular the sum ad postvty costrats) forces the soluto γ to be a doubly stochastc matrx weghted by 1. Sce the objectve fucto s lear, the soluto wll be a vertex of the set of doubly stochastc matrces whch s a permutato matrx as stated by the Brkhoff-vo Neuma Theorem [1]. To prove Theorem 1.1 we frst show that, gve the above codtos, f the soluto of the dscrete OT problem s the detty matrx γ = 1 I, the the correspodg trasportato T (x) gves T (x s ) = Axs +b = xt 1,...,. Ths property results from the terpolato formula Equato (14), whch states that a terpolato alog the Wasserste mafold from the source to target dstrbuto wth t [, 1] s defed as: ˆµ =,j Whe γ = 1 I, t leads to γ (, j)δ (1 t)x s +tx t j. ˆµ = 1 δ (1 t)x s +tx t. Ths equato shows that the mass from each sample x s s trasported wthout ay splttg to ts correspodg target sample x t hece T (x s ) = xt. Ideed, we have (1 t)x s + tx t = (1 t)x s + t(ax s + b) (1) = (1 t ta)x s + tb, whch for t = 1 gves T (x s ) = xt. Hece, the OT soluto γ = 1 I yelds a exact recostructo of the trasformato o the samples ad T (x s ) = Axs + b = x t 1,...,. Cosequetly, to prove Theorem 1.1 we just eed to prove that the OT soluto γ s equal to 1 I gve the above codtos. Ths s doe the followg for some partcular cases of affe trasformatos ad for the geeral case where A S + ad b R d. For better readablty we wll deote a sample the source doma x s as x, whereas a sample the target doma s x t. A. Case wth o trasformato (A = I d ad b =, Fgure 1) I ths case, there s o dsplacemet. γ = argm γ, C F (2) The soluto s obvously γ = 1 I, sce t does ot volate the costrats ad s the oly possble soluto wth a loss (ay mass ot o the dagoal of γ would mply a crease the loss).

2.4.2 Ital dstrbutos 2 trasp.1.9.8 2.5 2. 1.5 Ital dstrbutos 2 trasp.1.9.8.7.7. 4.6 1. 4.6.5.5.2 6.4.5 6.4.4..5 1. 1.5 8 2 4 6 8.3.2.1...5 1 2 3 4 5 8 2 4 6 8.3.2.1. Fg. 1: Optmal trasport the affe trasformato 1: case wth o trasformato Fg. 3: Optmal trasport the affe trasformato 3: geeral case 2.5 2. 1.5 1..5 Ital dstrbutos 2 4 6 trasp.1.9.8.7.6.5.4 wll have a loss strctly larger that the loss of the detty permutato. Therefore, the soluto of the optmzato problem s the detty permutato. Ths shows that a traslato of the data does ot chage the soluto of the optmal trasport...5..5 1. 1.5 2. 2.5 3. 3.5 8 2 4 6 8 Fg. 2: Optmal trasport the affe trasformato 2: case wth traslato B. Case wth traslato (A = I d ad b R d, Fgure 2) I ths case, the metrc matrx becomes C such that C(, j) = x x j b 2 = x x j 2 + b 2 2b (x x j ). I ths case the soluto γ = I. The correspodg permutato perm I leads to the loss J(I) = 1 x x b 2 = b 2. I the geeral case, the loss for a permutato perm ca be expressed as = 1 x x perm() 2 + b 2 = b 2 + 1 x x perm() 2 Sce x x j, f j, for ay permutato term perm() the x x perm() 2 >. Ths meas that ay permutato that s ot the detty permutato.3.2.1. C. Geeral case (A S + ad b R d, Fgure 3) I ths case, we wll set b = sce, as prove prevously, a traslato does ot chage the optmal trasport soluto. The loss ca be expressed as J(γ) = 1 x Ax perm() 2 = 1 = 1 x 2 + Ax perm() 2 2x Ax perm() x 2 + Ax 2 2 x Ax perm(). The soluto of the optmzato wll the be the permutato that maxmzes the term x Ax perm(). A s a postve defte matrx, whch meas that the prevous term ca be see as a sum of scalar products A1/2 x, A 1/2 x perm() = x, x perm(), where x = A 1/2 x. It s easy to show that J(γ) = 1 x x perm() b 2 x x perm() 2 = 2 x 2 2 x x perm(), = 1 x x perm() 2 + whch meas that x b 2 2 2 x x perm(). (x x perm() ) b The detty permutato s the a maxmum of the cross = 1 x x perm() 2 + b 2 2 scalar product ad a mmum of J. Sce x x j, f (x x ) b j the equalty stads oly for the detty permutato. Ths meas that oce aga the soluto of the optmzato problem s the detty matrx. Note that f A has egatve egevalues, the trasport fals recostructg the orgal trasformato, as llustrated Fgure 4. II. INFLUENCE OF THE CHOICE OF THE COST FUNCTION We exame here role played by the type of cost fucto C used the adaptato. Ths cost fucto allows aturally to take to accout the specfcty of the data space.

3 Ital dstrbutos A. The geeralzed codtoal gradet algorthm 3 2 2 trasp.1.9.8.7 We are terested the problem of mmzg uder costrats a composte fucto such as 1 4 6.6.5.4 m F (γ) = f(γ) + g(γ), (3) 1..5..5 1. 1.5 2. 2.5 8 2 4 6 8 Fg. 4: Optmal trasport the affe trasformato 4: case wth egatve egevalues I the case of the Offce-Caltech dataset, the features are hstograms. We have tested our OT-IT adaptato strategy wth alteratve costs that operate drectly o hstograms (l 2 ad l 1 orms), as t s a commo practce to cosder those metrcs whe comparg dstrbutos. Also, as mostly doe the lterature, we cosder the pre-pocessed features (ormalzato + zscore) together wth Eucldea ad squared Eucldea orms. As t ca be see Table I, the best results are ot always obtaed wth the same metrc. I the paper, we choose to oly cosder the l 2 orm o preprocessed data, as t s the oe that gave the best result o average, but t s relevat to questo what s the best metrc for a adaptato task. Several other metrcs could be cosdered, lke geodesc dstaces the case where data lve o a mafold. Also, a better talored metrc could be leart from the data. Some recet works are cosderg ths opto [2], [3] for other tasks. Ths costtutes a terestg perspectve for the followg extesos of our methodology for the doma adaptato problem. TABLE I: Results of chagg the cost fucto matrx C o the Offce-Caltech dataset. Domas l 2 l 2 +preprocessg l 2 2 +preprocessg l 1 C A 36.1 39.14 43.1 39.77 C W 25.8 35.59 37.63 37.63 C D 31.21 43.31 42.68 42.68 A C 32.41 34.64 34.46 38.29 A W 32.88 34.92 32.2 34.24 A D 35.3 36.94 33.76 36.94 W C 22.35 32.59 31.79 25.56 W A 27.56 39.98 38. 31.84 W D 84.71 9.45 89.81 92.99 D C 26.36 31.79 32.32 3.63 D A 29.54 32.36 3.69 32.67 D W 74.92 87.8 88.81 87.12 mea 38.17 44.96 44.6 44.2 III. REMARKS ON GENERALIZED CONDITIONAL GRADIENT We frst preset the geeralzed codtoal gradet algorthm as troduced by Bredes et al. [4] ad, the subsequet sectos, we wll provde addtoal propertes that are of terest whe applyg ths algorthm for regularzed optmal trasport problems..3.2.1. where both f( ) ad g( ) are covex ad dfferetable fuctos ad B s a compact set of R. Oe mght wat to beeft from ths composte structure durg the optmzato procedure. For stace, f we have a effcet solver for optmzg m f, γ + g(γ) (4) we propose ths solver the optmzato scheme stead of learzg the whole objectve fucto (as oe would do wth a codtoal gradet algorthm [5]). The resultg approach s defed Algorthm 1. Coceptually, our algorthm les -betwee the orgal optmzato problem ad the codtoal gradet. Ideed, f we do ot cosder ay learzato, step 3 of the algorthm s equvalet to solvg the orgal problem ad oe terate wll suffce for covergece. If we use a full learzato as the codtoal gradet approach, step 3 s equvalet to solvg a rough approxmato of the orgal problem. By learzg oly a part of the objectve fucto, we thus optmze a better approxmato of that fucto. Ths wll lead to a provably better certfcate of optmalty tha the oe of the codtoal gradet algorthm [6]. Ths makes us thus beleve that f a effcet solver of the partally learzed problem s avalable, our algorthm s of strog terest. Note that ths partal learzato dea has already bee troduced by Bredes et al. [4] for solvg problem (3). Ther theoretcal results related to the resultg algorthm apply whe f( ) s dfferetable, g( ), s covex ad both f ad g satsfy some others mld codtos lke coercvty. These results state that the geeralzed codtoal gradet algorthm s a descet method ad that ay lmt pot of the algorthm s a statoary pot of f + g. I what follows, we provde some results whe f ad g are dfferetable. Some of these results provde ovel sghts o the geeralzed gradet algorthms (relato betwee optmalty ad mmzer of the search drecto, covergece rate, ad optmalty certfcate) whle some are redudat to those proposed by Bredes (covergece). B. Covergece of the algorthm 1) Reformulatg the search drecto: Before dscussg covergece of the algorthm, we frst reformulated ts step 3 so as to make ts propertes more accessble ad ts covergece aalyss more ameable. The reformulato we propose s s k = argm f(γ k ), s γ k + g(s) g(γ k ) (5) ad t s easy to ote that the problem le 3 of Algorthm 1 s equvalet to ths oe ad leads to the same soluto.

4 Algorthm 1 Geeralzed Codtoal Gradet (GGG) 1: Italze k = ad γ P 2: repeat 3: Solve problem s k = argm f(γ k ), γ + g(γ) 4: Fd the optmal step α k wth γ = s k γ k α k = argm α 1 f(γ k + α γ) + g(γ k + α γ) or choose α k so that t satsfes the Armjo rule. 5: γ k+1 γ k + α k γ, set k k + 1 6: utl Covergece 2) Relato betwee mmzers of Problems 3 ad 5 : The above formulato allows us to derve a property that hghlgths the relato betwee problems 3 ad 5. Proposto 3.1: x s a mmzer of problem (3) f ad oly f γ = argm f(γ ), s γ + g(s) g(γ ). (6) Proof: The proof reles o optmalty codtos of costraed covex optmzato problem. Ideed, for a covex ad dfferetable f ad g, γ s soluto of problem (3) f ad oly f [7] f(γ ) g(γ ) N B (γ ), (7) where N B (γ) s the ormal coe of B at γ. I the same way, a mmzer s of problem (5) at γ k ca also be characterzed as f(γ k ) g(s ) N B (s ). (8) Now suppose that γ s a mmzer of problem (3). It s easy to see that f we choose γ k = γ, the because γ satsfes Equato (7), Equato (8) also holds. Coversely, f γ s a mmzer of problem (5) at γ, the γ also satsfes Equato (7). 3) Itermedate results ad gap certfcate: We prove several lemmas ad exhbt a gap certfcate that provdes a boud o the dfferece of the objectve value alog the teratos to the optmal objectve value. As oe may remark, our algorthm s very smlar to a codtoal gradet algorthm, the Frak-Wolfe algorthm. As such, our proof of covergece of the algorthm wll follow smlar les as those used by Bertsekas. Our covergece results s based o the followg proposto ad defto gve [5]. Proposto 3.2: from [5]. Let {γ k } be a sequece geerated by the feasble drecto method γ k+1 = γ k +α k γ wth γ k = s k γ k. Assume that { γ k } s gradetrelated ad that α k s chose by the lmted mmzato or the Armjo rule, the every lmt pot of {γ k } s a statoary pot. Defto A sequece γ k s sad to be gradet-related to the sequece γ k f, for ay subsequece of {γ k } k K that coverges to a o-statoary pot, the correspodg subsequece { γ k } k K s bouded ad satsfes lm sup F (γ k ) γ k < Bascally, ths property says that f a subsequece coverges to a o-statoary pot, the at the lmt pot the feasble drecto defed by γ s stll a descet drecto. Before provg that the sequece defed by { γ k } s gradet-related, we prove useful lemmas. Lemma 3.3: For ay γ k B, each γ k = s k γ k defes a feasble descet drecto. Proof: By defto, s k belogs to the covex set B. Hece, for ay α k [, 1], γ k+1 defes a feasble pot. Hece γ k s a feasble drecto. Now let us show that t also defes a descet drecto. By defto of the mmzer s k, we have for all s B f(γ k ), s k γ k + g(s k ) g(γ k ) f(γ k ), s γ k + g(s) g(γ k ). Because the above equalty also holds for s = γ k, we have f(γ k ), s k γ k + g(s k ) g(γ k ). (9) By covexty of g( ) we have g(s k ) g(γ k ) g(γ k ), s k γ k, whch, plugged equato, (9) leads to f(γ k ) + g(γ k ), s k γ k ad thus F (γ k ), s k γ k, whch proves that γ k s a descet drecto. The ext lemma provdes a terestg feature of our algorthm. Ideed, the lemma states that the dfferece betwee the optmal objectve value ad the curret objectve value ca be easly motored. Lemma 3.4: For all γ k B, the followg property holds [ f(γ k ), s γ k +g(s) g(γ k ) ] F (γ ) F (γ k ), m where γ s a mmzer of F. I addto, f γ k does ot belog to the set of mmzers of F ( ), the the secod equalty s strct. Proof: By covexty of f, we have f(γ ) f(γ k ) f(γ k ) (γ γ k ). By addg g(γ ) g(γ k ) to both sdes of the equalty, we obta F (γ ) F (γ k ) f(γ k ) (γ γ k ) + g(γ ) g(γ k ) ad because γ s a mmzer of F, we also have F (γ ) F (γ k ). Hece, the followg holds f(γ k ), γ γ k +g(γ ) g(γ k ) F (γ ) F (γ k )

5 ad we also have 5) Rate of covergece: We ca show that the objectve value F (x k ) coverges towards F (x ) a lear rate f we m f(γ k ), s γ k +g(s) g(γ k ) F (γ ) F (γ k ), have some addtoal smoothess codto/dts of F ( ). We ca easly prove ths statemet by followg the steps whch cocludes the frst part of the proof. Fally, f γ k s ot a mmzer of F, the we aturally have > F (γ ) F (γ k ). proposed by Jagg et al. [6] for the codtoal gradet algorthm. We make the hypothess that there exsts a costat C F so that for ay x, y B ad ay α [, 1], the equalty F ((1 α)x + αy) F (x) + α F (x) (y x) + C F 2 α2 4) Proof of covergece: Now that we have all the peces of the proof, let us show the key gredet. Lemma 3.5: The sequece { γ k } of our algorthm s gradet-related. Proof: For showg that our drecto sequece s gradet-related, we have to show that, gve a subsequece {γ k } k K that coverges to a o-statoary pot γ, the sequece { γ k } k K s bouded ad that lm sup F (γ k ) γ k <. Boudedess of the sequece aturally derves from the facts that s k B, γ k B ad the set B s bouded. The secod part of the proof starts by showg that F (γ k ), s k γ k = f(γ k ) + g(γ), s k γ k f(γ k ), s k γ k + g(s k ) g(γ k ), where the last equalty s obtaed owg to the covexty of g. Because that equalty holds for the mmzer, t also holds for ay vector s B : F (γ k ), s k γ k f(γ k ), s γ k + g(s) g(γ k ). Takg lmt yelds to lm sup F (γ k ), s k γ k f( γ), s γ +g(s) g( γ) for all s B. As such, ths equalty also holds for the mmzer lm sup F (γ k ), s k γ k m f( γ), s γ + g(s) g( γ). Now, sce γ s ot statoary, t s ot optmal ad t does ot belog to the mmzer of F, hece accordg to the above lemma 3.4, m f( γ), s γ + g(s) g( γ) <, whch cocludes the proof. Ths latter lemma proves that our drecto sequece s gradet-related, thus proposto 3.2 apples. holds. Based o ths equalty, for a sequece {x k } obtaed from the geeralzed codtoal gradet algorthm we have F (x k+1 ) F (x ) = F ((1 α k )x k + α k s k ) F (x ) F (x k ) F (x ) + α k F (x k ) (s k x k ) + C F 2 α2 k. (1) Let us deote h(x k ) = F (x k ) F (x ). Now by addg to both sdes of the equalty α k [g(s k ) g(x k )], we have h(x k+1 ) + α k [g(s k ) g(x k )] h(x k ) + α k [ f(x k ) (s k x k ) + g(s k ) g(x k )] +α k g(x k ) (s k x k ) + C F 2 α2 k h(x k ) + α k [ f(x k ) (x x k ) + g(x ) g(x k )] +α k g(x k ) (s k x k ) + C F 2 α2 k, where the secod equalty comes from the defto of the search drecto s k. Because f( ) s covex, we have f(x ) f(x k ) f(x k ) (x x k ). Thus, we have h(x k+1 ) + α k [g(s k ) g(x k )] h(x k ) + α k [f(x ) f(x k ) + g(x ) g(x k )] +α k g(x k ) (s k x k ) + C F 2 α2 k (1 α k )h(x k ) + α k g(x k ) (s k x k ) + C F 2 α2 k. Now, aga owg to the covexty of g( ), we have g(s k ) + g(x k ) g(x k ) (s k x k ). Usg ths fact the last above equalty leads to h(x k+1 ) (1 α k )h(x k ) + C F 2 α2 k. (11) Based o ths result, we ca ow state the followg Theorem 3.6: For each k 1, the terates x k of Algorthm 1 satsfy F (x k ) F (x ) 2C F k + 2. Proof: The proof stads o Equato (11) ad o the same ducto as the oe used by Jagg et al [6]. Note that ths covergece property would also hold f we choose the step sze as α k = 2 k+2.

6 TABLE II: Computatoal tme (secods) for the best set of regularzato parameters OT-exact OT-IT OT-GL OT-Laplace U M 86. 4.6 92.5 55.6 M U 85. 2.3 75.4 2.5 P1 P2 131. 3.8 432.7 333.2 P1 P3 133.9 27.9 456.6 296.2 P1 P4 132.5 21.3 276.5 153.4 P2 P1 13.9 38.3 666.8 413.1 P2 P3 65.1 14.3 418.9 182.8 P2 P4 67.8 11.7 27.1 12. P3 P1 135.7 36.4 519. 389. P3 P2 67.6 14.3 297.8 156.6 P3 P4 62.2 9.8 248.3 18.8 P4 P1 134.6 22.9 54.4 272.6 P4 P2 66.7 11.1 262.5 123. P4 P3 65.3 12.6 269.6 127.5 C A 26.9 1.2 22.5 17.8 C W 6.8.3 6.6 7.4 C D 3.4.2 3.5 2.4 A C 26.5 1.1 29.4 23.8 A W 5.6.3 6. 6. A D 2.9.2 3.2 4.1 W C 6.8.3 16.2 7.6 W A 5.7.3 4.1 7.4 W D.8.1 2.2 1.1 D C 3.6.2 14.7 5.9 D A 3..2 12.5 5.1 D W.8.1 3.8 1.1 mea 86.8 2.4 349.1 246.4 Fg. 5: Example of the evoluto of the objectve value alog (top) the teratos ad (bottom) the rug tmes for the geeralzed codtoal gradet (CGS) ad the codtoal gradet (CG + Mosek) algorthms. 6) Computatoal performaces: Let us frst show that the GCG algorthm s more effcet tha a classcal codtoal gradet method as the oe used [8]. We llustrate ths Fgure 5, showg the covergece (top pael) ad the correspodg computatoal tmes (bottom pael). We take as a example the case of computg the OT-GL trasport pla of dgts 1 ad 2 USPS to those MNIST. For ths example, we have allowed a maxmum of 5 teratos. Regardg covergece of the cost fucto alog the teratos, we ca clearly see that, whle GCG reaches early-optmal objectve value aroud 1 teratos, the CG approach s stll far from covergece after 5 teratos. I addto, the per-terato cost s sgfcatly hgher for the codtoal gradet algorthm. For ths example, we save a order of magtude of rug tme yet CG has ot coverged. We ow study the computatoal performaces of the dfferet optmal trasport strateges the vsual object recogto tasks cosdered above. We use Pytho mplemetatos of the dfferet OT methods. The test were ru o a smple Macbook pro stato, wth a 2.4 Ghz processor. The orgal OT-Exact soluto to the optmal trasport problem s computed wth the MOSEK [9] lear programmg solver 1, whereas the other strateges follow 1 other publcly avalable solvers were cosdered, but t tured out ths partcular oe was a order of magtude faster tha the others our ow mplemetato based o the Skhor-Kopp method. For the regularzed optmal trasport OT-GL ad OT-Laplace we used the codtoal gradet splttg algorthm (the source code wll be made avalable upo the acceptace of the artcle). We report Table II the computatoal tme eeded by the OT methods. As expected, OT-IT s the less computatoally tesve. The soluto of the exact optmal trasport (OT-exact) s loger to compute by a factor 4. Also, as expected, the two regularzed versos OT- GL ad OT-Laplace are the most demadg methods. We recall here that the maxmum umber of er loop of the GCG approach was set to 1, meag that each of those methods made 1 calls to the Skhor-Kopp solver used by OT-IT. However, the added computatoal cost s mostly due to the le search procedure (le 4 Algorthm 1), whch volves several computatos of the cost fucto. We expla the dfferece betwee OT-GL ad OT-Laplace by the dfferece of computato tme eeded by ths procedure. Summg up, oe ca otce that, eve for large problems (case P1 P4 for stace, volvg 3332 3329 varables), the computato tme s ot prohbtve ad remas tractable. REFERENCES [1] J. vo Neuma, A certa zero-sum two-perso game equvalet to the optmal assgmet problem, Cotrbutos to the Theory of Games, vol. 2, 1953. [2] G. Ze, E. Rcc, ad N. Sebe, Smultaeous groud metrc learg ad matrx factorzato wth earth mover s dstace, Patter Recogto (ICPR), 214 22d Iteratoal Coferece o, Aug 214, pp. 369 3695.

[3] M. Cutur ad D. Avs, Groud metrc learg, J. Mach. Lear. Res., vol. 15, o. 1, pp. 533 564, Ja. 214. [4] Krsta Bredes, Drk Lorez, ad Peter Maass, Equvalece of a geeralzed codtoal gradet method ad the method of surrogate fuctoals, Zetrum für Techomathematk, 25. [5] Dmtr P Bertsekas, Nolear programmg, Athea scetfc Belmot, 1999. [6] Mart Jagg, Revstg frak-wolfe: Projecto-free sparse covex optmzato, Proceedgs of the 3th Iteratoal Coferece o Mache Learg (ICML-13), 213, pp. 427 435. [7] D. Bertsekas, A. Nedc, ad A. Ozdaglar, Covex Aalyss ad Optmzato, Athea Scetfc, 23. [8] Sra Ferradas, Ncolas Papadaks, Gabrel Peyré, ad Jea-Fraços Aujol, Regularzed dscrete optmal trasport, SIAM Joural o Imagg Sceces, vol. 7, o. 3, 214. [9] Erlg Aderse ad Kud Aderse, The mosek teror pot optmzer for lear programmg: A mplemetato of the homogeeous algorthm, Hgh Performace Optmzato, vol. 33 of Appled Optmzato, pp. 197 232. Sprger US, 2. 7