arxiv: v1 [math.oc] 6 Jan 2016

Size: px

Start display at page:

Download "arxiv: v1 [math.oc] 6 Jan 2016"

Marcus Carroll
5 years ago
Views:

1 arxv: v1 [math.oc] 6 Jan 2016 THE SUPPORTING HALFSPACE - QUADRATIC PROGRAMMING STRATEGY FOR THE DUAL OF THE BEST APPROXIMATION PROBLEM C.H. JEFFREY PANG Abstract. We consder the best approxmaton problem (BAP) of proectng a pont onto the ntersecton of a number of convex sets. It s known that Dykstra s algorthm s alternatng mnmzaton on the dual problem. We extend Dykstra s algorthm so that t can be enhanced by the SHQP strategy of usng quadratc programmng to proect onto the ntersecton of supportng halfspaces generated by earler proecton operatons. By lookng at a structured alternatng mnmzaton problem, we show the convergence rate of Dykstra s algorthm when reasonable condtons are mposed to guarantee a dual mnmzer. We also establsh convergence of usng a warmstart terate for Dykstra s algorthm, show how all the results for the Dykstra s algorthm can be carred over to the smultaneous Dykstra s algorthm, and dscuss a dfferent way of ncorporatng the SHQP strategy. Lastly, we show that the dual of the best approxmaton problem can have an O(1/k 2 ) accelerated algorthm that also ncorporates the SHQP strategy. Contents 1. Introducton 1 2. Prelmnares: Dykstra s algorthm 4 3. Extended Dykstra s algorthm 6 4. Convergence rate of alternatng mnmzaton and Dykstra s algorthm 9 5. Smultaneous Dykstra s algorthm Usng the APG for (D ) Concluson 20 Appendx A. Proof of convergence of Algorthm References Introducton We consder the followng problem, known as the best approxmaton problem (BAP). Date: September 5, Mathematcs Subect Classfcaton. 41A50, 90C25, 68Q25, 47J25. Key words and phrases. alternatng mnmzaton, Dykstra s algorthm, best approxmaton problem. 1

2 SHQP STRATEGY FOR THE BAP 2 (BAP ) mn f(x) := 1 2 x d 2 (1.1) s.t. x C := C 1 C m, where d s a gven pont and C, = 1,..., m, are closed convex sets n a Hlbert space X. The BAP s equvalent to proectng d onto C. We shall assume throughout that C. We now recall some background on frst order methods, alternatng mnmzaton, and algorthms for the best approxmaton problem Frst order methods and alternatng mnmzaton. When presented wth a problem wth a large number of varables, frst order methods (whch use gradent descent and avod computatonally expensve operatons lke solvng lnear systems) and other methods that decompose the large problems nto smaller peces to be solved may be the only practcal alternatve. For these algorthms, the nonasymptotc or absolute rate of convergence of the functon values to the optmal obectve value hold rght from the very frst teraton of the algorthm, and are more useful than the asymptotc rates. These rates are typcally sublnear, lke O(1/k) for example. Classcal references on frst order methods nclude [NY83], and newer references nclude [Nes04, JN11a, JN11b]. As explaned n [NY83, Nes04], the nonasymptotc rates of convergence of frst order algorthms for smooth convex functons s at best O(1/k 2 ). Nesterov proposed varous O(1/k 2 ) nonasymptotc methods (whch are thus optmal) for such problems (frst method [Nes83], second method [Nes88] and thrd method [Nes05]), and other optmal methods were studed n [AT06, BT09, LLM11]. The paper [BT09] also descrbed an O(1/k 2 ) algorthm for solvng the sum of a smooth convex functon and a structured nonsmooth functon. These optmal methods are also known as accelerated proxmal gradent (APG) methods, and ther desgn and analyss are unfed n the paper [Tse08] (who also dealt wth convex-concave mnmzaton). An optmzaton problem wth a large number of varables can have ts varables dvded nto a number of blocks so that each subproblem has fewer varables. These subproblems are solved n some order (often n a cyclc manner) whle the varables n other blocks are kept fxed. See the formula (2.3) for an elaboraton. Ths s referred to as alternatng mnmzaton (AM), and sometmes referred to Cyclc Coordnate Mnmzaton (CCM). Another alternatve s to perform only gradent descent on each block, whch would reduce to what s descrbed as Cyclc Coordnate Descent, CCD. Methods lke AM and CCD are qute old. If a functon to be mnmzed were nonsmooth, then t s possble for the AM and CCD to be stuck at a non-optmal soluton. A O(1/k) rate of convergence for AM of a two block problem was establshed n [Bec15] wthout any assumpton of strong convexty. As mentoned n [Bec15], AM s also known n the lterature as block-nonlnear Gauss Sedel method or the block coordnate descent method (see for example [Ber99]). They also state the other contrbutons of [Aus76, BT13, Ber99, GS99, LT93]. There has been much recent research on stochastc/ randomzed CD. Snce we are not dealng wth stochastc CD n ths paper, we shall only menton the papers [Nes12, FR15], and defer to the ntroducton and tables n [FR15] for a summary

3 SHQP STRATEGY FOR THE BAP 3 of stochastc CD. A recent work [CP15] also dentfes some cases where the determnstc CD scheme can have an O(1/k 2 ) acceleraton The best approxmaton problem and the method of alternatng proectons. The BAP s often assocated wth the set ntersecton problem (SIP) (SIP ) Fnd x C := C 1 C m. A well studed method for the SIP s the method of alternatng proectons (MAP). We recall materal from [BC11, Deu01a, Deu01b, ER11] on materal on the MAP. As ts name suggests, the MAP proects the terates n a cyclc or some other manner so that the terates converge to a pont n the ntersecton of these sets. One acceleraton of the MAP for convex problems s the supportng halfspace and quadratc programmng strategy (SHQP): The proecton process generates supportng halfspaces of each C, and the set C s a subset of the polyhedron obtaned by ntersectng these halfspaces. Proectng onto the polyhedron can accelerate the convergence of the MAP, and may lead to superlnear convergence n small problems. The SHQP strategy was dscussed n [Pan15c]. See Fgure 4.1 for an llustraton. Ths dea was dscussed n less generalty n [BCK06] and other papers. Other methods of acceleratng the MAP nclude [GPR67, GK89, BDHP03]. As remarked by several authors, the MAP does not converge to the soluton of the BAP. Dykstra s algorthm [Dyk83] solves the best approxmaton problem through a sequence of proectons onto each of the sets n a manner smlar to the MAP, but correcton vectors are added before every proecton. The proof of convergence to P C (d) was establshed n [BD85] and sometmes referred to as the Boyle-Dykstra theorem. Dykstra s algorthm was redscovered by [Han88], who showed that Dykstra s algorthm s equvalent to AM on the dual problem. See also [GM89]. When the sets C are halfspaces, the convergence s asymptotcally lnear [DH94]. A nonasymptotc O(1/k) convergence rate of Dykstra s algorthm was obtaned n [CP15] usng the methods smlar to [BT13, Bec15] when a dual mnmzer exsts. (Ths does not dmnsh the sgnfcance of the Boyle-Dykstra theorem. In our opnon, a quck glance at the respectve proofs shows that the Boyle-Dykstra theorem, whch proves the convergence to the prmal optmal P C (d) even when a dual mnmzer does not exst, s techncally more sophstcated than the proof of the O(1/k) convergence rate of Dykstra s algorthm when a dual mnmzer exsts.) Dykstra s algorthm s qute old, so we refer the reader to the commentary n [Deu01b, ER11] for more on prevous work on Dykstra s algorthm. In contrast to the stuaton for the MAP, not much has been done on acceleratng Dykstra s algorthm for the BAP. The queston of how to accelerate Dykstra s algorthm has been explctly posed as an open problem n [Deu01b, Deu01a, ER11]. A method was proposed n [LR15]. The only property of Dykstra s algorthm needed for ther acceleraton s that Dykstra s algorthm generates a sequence convergng to P C (d), so there s stll be room for mprovng Dykstra s algorthm. See also [HS15]. A varant of Dykstra s algorthm that s more sutable for parallel computatons s the smultaneous Dykstra s algorthm proposed n [IP91] usng the product space formulaton of [Pe84]. Some specfc best approxmaton problems can be solved wth specalzed methods. The proecton of a pont nto the ntersecton of halfspaces can be solved by classcal methods of quadratc programmng. Other sets for whch the proecton

4 SHQP STRATEGY FOR THE BAP 4 onto the ntersecton s easy nclude the ntersecton of an affne space and the semdefnte cone [QS06, Mal04]. The subgradent algorthm can be used to solve a convex constraned optmzaton problem wth a convergence rate of O(1/ k). Hence the BAP can be solved at a rate of O(1/ k). See [Nes04]. In [Pan15b], we obtaned a convergence rate of O(1/k) n the case when the obectve functon s a strongly convex quadratc functon by adaptng a Haugazeau s algorthm [Hau68] (see also [BC11]), whch s another known method for solvng the BAP. We note however that the rate of O(1/k) n Haugazeau s algorthm s typcal, even when solvng a BAP nvolvng only two halfspaces Contrbutons of ths paper. The man contrbuton of ths paper s to extend Dykstra s algorthm so that the SHQP strategy can be ncorporated nto Dykstra s algorthm. (We try to reserve the use of the word acceleraton to mean an O(1/k 2 ) algorthm.) See Algorthm 3.1 for our extenson of Dykstra s algorthm. Recall that the Boyle-Dykstra theorem proves the convergence of Dykstra s algorthm to the prmal soluton of the BAP. We prove that the extended Dykstra s algorthm also converges to the prmal soluton of the BAP (even when there s no dual mnmzer). Next, we show that a commonly occurrng regularty assumpton guarantees the exstence of a dual mnmzer. The exstence of such a dual mnmzer would, by the results n [CP15], mply that Dykstra s algorthm converges at a O(1/k) rate. Ths analyss also carres over to our extended Dykstra s algorthm. We pont out that t s useful to use warmstart solutons for Dykstra s algorthm and our extenson. Whle t s recognzed that Dykstra s algorthm s the alternatng mnmzaton algorthm on the dual, t appears that every descrpton and proof of convergence of Dykstra s algorthm n the lterature starts wth the default zero vector. See further dscussons n Subsecton 2.1. We answer the natural queston of whether Dykstra s algorthm and our extenson converge to the optmal prmal soluton wth a warmstart terate by adaptng the proof of the Boyle-Dykstra Theorem [BD85]. See Appendx A. We show how all these deas mentoned earler can be mplemented for the smultaneous Dykstra s Algorthm n Secton 5. We also explan another way to ncorporate the SHQP strategy on the BAP n Subsecton 4.3 that works when a mnmzer to the dual problem exsts and s more natural to augment to the APG. Whle ths strategy s more natural than our extended Dyktra s algorthm, we were not able to prove ts global convergence usng the framework of the Boyle-Dykstra theorem Notaton. Our notaton s farly standard. For a closed convex set D, we let P D ( ) denote the proecton onto D. The normal cone of D at a pont x D n the usual sense of convex analyss s denoted by N D (x). We wll let ỹ = (y 1,..., y m ). When we dscuss the extended Dykstra s algorthm n Secton 3, we wll need ỹ = (y 1,..., y m, y m+1 ), but ths shouldn t cause too much confuson. 2. Prelmnares: Dykstra s algorthm In ths secton, we recall Dykstra s algorthm and some results. We also gve a dscusson of warmstartng Dykstra s algorthm.

5 SHQP STRATEGY FOR THE BAP 5 Algorthm 2.1. (Warmstart Dykstra s algorthm) Let X be a Hlbert space. Consder the problem of proectng a pont d X onto C X, where C = m =1 C and C are closed convex sets. Choose startng y (0) X for all {1,..., m}, and let x (0) m = d (y (0) y m (0) ). 01 For k = 1, 2, x (k) 0 = x m (k 1) 03 For = 1, 2,..., m 04 z (k) := x (k) 1 + y(k 1) 05 x (k) := P C (z (k) ) 06 y (k) := z (k) x (k) 07 End for 08 End for Let the vector ỹ X m be (y 1,..., y m ), where each y X. For each closed convex set D X, let δ (, D) : X R be defned by δ (y, D) = max x D y, x. (The functon δ (, D) s also the conugate of the ndcator functon δ(, D), thus explanng our notaton.) Defne the dual problem (D ) by (D ) nf y1,...,y m h(y 1,..., y m ) := f(y y m ) + where y X and the f : X R s as n (1.1). We revew some easy results on (D ). m δ (y, C ), (2.1) Proposton 2.2. Let X be a Hlbert space. Let C be closed convex sets n X for {1,..., m}, and let C = m =1 C. Let d X and x = P C (d). Let ỹ = (y 1,..., y m ). We have the followng: (1) nf y1,...,y m h(y 1,..., y m ) = 1 2 d d x 2. (2) Let v : X m R be defned by v(y 1,..., y m ) = 1 2 d (y y m ) x 2 + m δ (y, C x). (2.2) Then v(ỹ) = h(ỹ) d, x x 2, and nfỹ v(ỹ) = 0. (3) We have v(y 1,..., y m ) 1 2 d (y y m ) x 2. (4) If (y 1,..., y m ) s a mnmzer of v( ) (or equvalently, h( )), then x = d (y y m ). (5) If m = 1, then y 1 = d x s a mnmzer of v( ) (or equvalently, h( )). Proof. Statement (1) can be obtaned from [GM89, pages 32 33]. For Statement (2), note that =1 v(ỹ) = 1 2 d (y y m ) x 2 + m δ (y, C x) = 1 2 d [ (y y m ) 2 d (y y m ), x x 2 m ] + δ (y, C ) y y m, x =1 [ m ] = 1 2 d (y y m ) 2 d, x x 2 + δ (y, C ) =1 = h(ỹ) d, x x 2. =1 =1

6 SHQP STRATEGY FOR THE BAP 6 The rest of Statement (2) s elementary. Statements (3) and (4) follow easly from the fact 0 C x, whch gves δ (y, C x) y, 0 = 0. Statement (5) s easy. As explaned n [Han88, GM89] and perhaps other sources, alternatng mnmzaton n the order y (k) 1 = arg mn y (k) y 2 = arg mn y (k). y m = arg mn y h(y, y (k 1) 2, y (k 1) 3,..., y (k 1) m ) (2.3) h(y (k) 1, y, y(k 1) 3,..., y m (k 1) ) h(y (k) 1, y(k) 2,..., y(k) m 1, y), leads to the Dykstra s algorthm as presented n Algorthm 2.1 through Proposton 2.2(5). We also have the followng easly verfable facts: x (k) = d y (k) 1 y (k) 1 y(k) y (k 1) +1 y m (k 1) (2.4) and z (k) = d y (k) 1 y (k) 1 y(k 1) +1 y m (k 1) (2.5) 2.1. Warmstart Dykstra s algorthm. It appears that all descrptons and proofs of convergence of Dykstra s algorthm use the default startng pont y (0) = 0 for all {1,..., m}. We saw earler that Dykstra s algorthm s alternatng mnmzaton on the dual problem wth startng pont ỹ (0). In partcular, the terates ỹ (k) are such that {h(ỹ (k) )} k s a non-ncreasng sequence of real numbers to the dual obectve value. One may then choose a startng pont ỹ (0) such that h(ỹ (0) ) s closer to the dual obectve value than the default startng pont of all zeros. There are several ways to obtan a dfferent startng pont. (1) One can use greedy algorthms (that may not guarantee global convergence to the optmal soluton) to decrease the dual obectve values. A plausble strategy s to use the greedy algorthms tll they do not appear to acheve good decrease n the value h( ), then swtch to the warmstart Dykstra s algorthm, or our extended algorthm n Algorthm 3.1, to guarantee convergence to the optmal prmal soluton. (2) A warmstart soluton may be avalable after solvng a nearby problem. For example, one mght want to resolve a problem after a set has been added or removed, or after a perturbaton of parameters. Alternatvely, there may be a nearby structured problem that can be solved approxmately wth less effort than the orgnal problem. The proof of convergence of Dykstra s algorthm wth a dfferent startng pont s not too dfferent from the Boyle-Dykstra theorem. We defer the proof to Appendx A, where we also prove the convergence of our extended Dykstra s algorthm to be ntroduced n Secton Extended Dykstra s algorthm As mentoned n Subsecton 1.2, the SHQP strategy (of collectng halfspaces contanng C generated by earler proectons and then proectng onto the ntersecton

7 SHQP STRATEGY FOR THE BAP 7 of the halfspaces by QP) can enhance the convergence of the method of alternatng proectons for the set ntersecton problem. In ths secton, we present our extenson of Dykstra s algorthm n Algorthm 3.1 and how t can ncorporate the SHQP strategy. In order to extend the proof of the Boyle-Dykstra theorem to establsh the prmal convergence of our extended Dykstra s algorthm, we need Theorem 3.4(2). The proof of Theorem 3.4(2) llustrates why lnes 8 and 12 of Algorthm 3.1 were desgned as such. The other parts of the Boyle-Dykstra theorem follow wth lttle modfcatons, so we defer the rest of the convergence proof to Appendx A. We now present our extended Dykstra s algorthm. Algorthm 3.1. (Extended Dykstra s algorthm) Consder the BAP (1.1). Let y (0) X be the startng dual varables for each component {1,..., m}. We also ntroduce a varable y (k) m+1 X, wth startng value y(0) m+1 beng 0, n our calculatons. Let Hm+1 0 = X. Set x (0) m+1 = d m+1 01 For k = 1, 2, x (k) 0 = x (k 1) m+1 03 For = 1, 2,..., m 04 z (k) 05 x (k) 06 y (k) := x (k) 1 + y(k 1) := P C (z (k) ) x (k) =1 y(0). := z (k) 07 End for 08 Let Cm+1 k X be such that C Ck m+1 Hk 1 09 z (k) m+1 := x(k) m + y (k 1) m+1 10 x (k) m+1 = P Cm+1 k (z(k) m+1 ) m y (k) m+1 = z(k) m+1 x(k) m+1 12 Let Hm+1 k be the halfspace wth normal y(k) m+1 passng through x(k) m+1,.e., 13 End for H k m+1 = {x : y (k) m+1, x x(k) m+1 0}. Remark 3.2. (Desgnng Cm+1 k ) In lne 8 of Algorthm 3.1, the set Ck m+1 can be chosen to be the ntersecton of Hm+1 k 1 and the halfspaces generated through earler proectons. The proecton P C k m+1 ( ) can then be calculated easly usng methods of quadratc programmng f the number of halfspaces defnng Cm+1 k s small. It s clear to see that Algorthm 3.1 reduces to the orgnal Dykstra s algorthm f we had kept Hm+1 k = Ck m+1 = X for all k {1, 2,... }. The choce of storng halfspaces for n lne 12 smplfes computatons nvolved. H k m+1 Remark 3.3. (Postonng sets of type Cm+1 k ) If the number m s large, then one can ntroduce more than ust one addtonal set of the type Cm+1 k at the end of all the orgnal sets n an mplementaton of Algorthm 3.1. For example, one can ntroduce the addtonal set after every fxed number of orgnal sets so that the quadratc programs formed wll have a manageable number of halfspaces. Theorem 3.4 below wll be crucal n provng that the terates {x (k) } of Algorthm 3.1 converges to the optmal prmal soluton. The proof of the Theorem 3.4 explans how the sets Cm+1 k and Hk m+1 were desgned n order to mantan the concluson n Theorem 3.4(2).

8 SHQP STRATEGY FOR THE BAP 8 Theorem 3.4. (Propertes of Algorthm 3.1) In Algorthm 3.1, defne the dual functon h k : X m+1 R at the kth teraton and h : X m+1 R by [ m ] h k (ỹ) = 1 2 d (y y m+1 ) 2 + δ (y, C ) + δ (y m+1, Hm+1 k ) [ =1 m ] h(ỹ) = 1 2 d (y y m+1 ) 2 + δ (y, C ) + δ (y m+1, C). =1 (3.1) Let ỹ (k) = (y (k) 1,..., y(k) m, y (k) m+1 ). The followng hold: (1) h k 1 (ỹ (k 1) ) h k (ỹ (k) ) + 1 m+1 2 =1 y(k) y (k 1) 2. (2) The sum m+1 =1 =1 y() y ( 1) 2 s fnte. Proof. We have the followng chan of nequaltes: = 1 2 d (y(k) y m (k) ) y (k 1) m δ (y (k 1) m+1, Hk 1 m+1 ) d (y(k) y m (k) ) y (k 1) m δ (y (k 1) m+1, Ck m+1 ) d (y(k) y m (k) ) y (k) m δ (y (k) m+1, Ck m+1) y(k) m+1 y(k 1) m+1 2 d (y(k) y m (k) ) y (k) m δ (y (k) m+1, Hk m+1 ) y(k) m+1 y(k 1) m+1 2. (3.2) The frst nequalty comes from the fact that Cm+1 k Hm+1 k 1, whch mples that δ (, Hm+1 k 1 ) δ (, Cm+1). k The second nequalty comes from the fact that y (k) m+1 s the mnmzer of the strongly convex functon wth modulus 1 defned by y 1 2 d (y(k) y m (k) ) y 2 + δ (y, Cm+1 k ). The fnal equaton follows readly from the defnton of Hm+1 k. We can apply the same prncple n (3.2) to show that for all {1,..., m}, we have d y(k) 1 y (k) 1 y(k 1) d y(k) 1 y (k) 1 y(k) +δ (y (k), C ) y(k) y (k 1) +1 y (k 1) m δ (y (k 1), C ) y (k 1) +1 y (k 1) m y (k 1) (3.3) Combnng (3.2) and (3.3) gves (1). From the fact that Hm+1 k C, we have δ (, Hm+1) k δ (, C), whch n turn mples that h k (y) h(y). Moreover, for each k, we make use of the observaton n Proposton 2.2(1) to get Hence Thus (2) follows. k m y () =1 =1 nf y X hk (y) = mn h(y). m+1 y X m+1 y ( 1) 2 h 0 (ỹ (0) ) h k (ỹ (k) ) h 0 (ỹ (0) ) h(ỹ (k) ) h 0 (ỹ (0) ) mn y h(y). The rest of the proof of the prmal convergence of Algorthm 3.1 s not too dfferent from the Boyle-Dykstra theorem, so we wll prove the convergence result n Appendx A.

9 SHQP STRATEGY FOR THE BAP 9 4. Convergence rate of alternatng mnmzaton and Dykstra s algorthm In ths secton, we frst recall the proof of the O(1/k) convergence rate of alternatng mnmzaton under the assumpton of strong convexty of subproblems and bounded level sets. Ths wll then gve us the convergence rate of the functon h( ) n the dual of Dykstra s algorthm. We also dscuss how ths analyss can be carred over to our extended Dykstra s algorthm. In Subsecton 4.3, we ntroduce another more natural way to ncorporate the SHQP heurstc nto Dykstra s algorthm and attans the nonasymptotc O(1/k) convergence rate when there s a dual mnmzer. But we note that we are unable to prove the global convergence to the prmal optmal soluton for ths new strategy General convergence rate result on alternatng mnmzaton. In ths subsecton, we recall that under certan condtons, alternatng mnmzaton has a nonasymptotc convergence rate of O(1/k). We need the followng result proved n [BT13] and [Bec15]. Lemma 4.1. (Sequence convergence rate) Let α > 0. Suppose the sequence of nonnegatve numbers {a k } k=0 s such that a k a k+1 + αa 2 k+1 for all k {1, 2,... }. (1) [BT13, Lemma 6.2] If furthermore, a α and a α a k 1.5 αk for all k {1, 2,... }. (2) [Bec15, Lemma 3.8] For any k 2, a k max { ( 1 2 In addton, for any ǫ > 0, f { k max ) (k 1)/2 a0, 4 α(k 1) 2 ln(2) [ln(a 0) + ln(1/ǫ)], 4 αǫ }. } + 1,, then then a n ǫ. The second formula refnes the frst by reducng the dependence of a k on the frst few terms of {a }. We now prove our general convergence rate result for alternatng mnmzaton. The followng result was dscussed n [CP15] and ts deas appeared n [BT13, Bec15]. Theorem 4.2. (O(1/k) Convergence rate of alternatng mnmzaton) Let f : X m R be a smooth convex functon, and g : X R be (not necessarly smooth) convex functons for {1,..., m}, Defne h : X m R by h(y 1, y 2,..., y m ) = f(y 1, y 2,..., y m ) + m g (y ). such that (1) The gradent f : X m X m s Lpschtz contnuous wth modulus L, and (2) There s a number µ > 0 such that for all {1,..., m} and fxed varables y 1, y 2,..., y 1, y +1,..., y m, the map =1 y f(y 1, y 2,..., y 1, y, y +1,..., y m ) s strongly convex wth modulus µ > 0.

10 SHQP STRATEGY FOR THE BAP 10 (3) A mnmzer ỹ = (y1, y 2,..., y m ) of h( ) exsts. Moreover, M defned by M = sup{ y (k) y : k 0} s fnte for all {1,..., m 1}. Suppose two successve terates ỹ (k 1) = (y (k 1) 1, y (k 1) 2,..., y m (k 1) ) and ỹ (k) defned smlarly are produced by alternatng mnmzaton descrbed n (2.3). Let M = max {1,...,m 1} M. Then h(ỹ (k 1) ) h(ỹ ) h(ỹ (k) ) h(ỹ ) + Applyng Lemma 4.1 to a k := h(ỹ (k) ) h(ỹ ) gves and µ 2(m 1) 3 M 2 L 2 [h(ỹ (k) ) h(ỹ )] 2. (4.1) h(ỹ (k) ) h(ỹ ) 1 k max{ 3(m 1)3 M 2 L 2 µ, h(ỹ (1) ) h(ỹ ), 2[h(ỹ (2) ) h(ỹ )]}, h(ỹ (k) ) h(ỹ ) max { ( 1 2 ) (k 1)/2 [h(ỹ (0) ) h(ỹ )], 8(m 1)3 M 2 L 2 µ(k 1) Proof. The proof of ths result follows smlar deas as those n [CP15], whch n turn appeared n [BT13, Bec15]. Snce we wll use elements of ths proof for the proof of Theorem 4.5, we now gve a self contaned proof. For each, let h : X R be defned by h (y) = h(y (k) 1, y(k) 2,..., y(k) 1, y, y(k 1) +1,..., y m (k 1) ). In other words, h ( ) s the th block of h : X m R. The mappng h ( ) has mnmzer y (k), and s strongly convex wth modulus µ from assumpton (2). Hence Hence h (y (k 1) ) h (y (k) ) + µ 2 y(k) y (k 1) 2. h(ỹ (k 1) ) h(ỹ ) h(ỹ (k) ) h(ỹ ) + m µ 2 y(k) =1 }. y (k 1) 2. (4.2) Next, we try to fnd a subgradent n h(ỹ (k) ) by lookng at the components h (ỹ). It s clear that 0 h m (y m (k) ). We then look at the th component of f ( ), whch we denote by f ( ). For each {1,..., m}, the optmalty condtons of each teraton of alternatng mnmzaton mples that Thus 0 f (y(k) 1, y(k) 2,..., y(k) f (ỹ (k) ) f (y (k) 1, y(k) 2,..., y(k) 1, y(k) 1, y(k), y (k 1) +1,..., y m (k 1) Choose a subgradent s h(ỹ (k) ), wth s X m such that We have s = f (ỹ(k) ) f (y(k) 1, y(k) 2,..., y(k) s f (ỹ(k) ) f (y(k) 1, y(k) 2,..., y(k) L m y (k) =+1 L m y (k) =2 y (k 1) y (k 1). ) + g (y (k) )., y (k 1) +1,..., y m (k 1) ) f (ỹ (k) ) + g (y (k) ). 1, y(k) 1, y(k), y (k 1) +1,..., y (k 1) m )., y (k 1) +1,..., y m (k 1) ) (4.3)

11 SHQP STRATEGY FOR THE BAP 11 The above dervaton also remnds us that s m = 0. Thus, makng use of condton (3), we have h(ỹ ) h(ỹ (k) ) + s, ỹ ỹ (k) h(ỹ (k) ) h(ỹ ) s, ỹ ỹ (k) m 1 s y y(k) =1 [ m L y (k) y (k 1) =2 [ m (m 1)ML Applyng (4.4) on (4.2) gves h(ỹ (k 1) ) h(ỹ ) h(ỹ (k) ) h(ỹ ) + m h(ỹ (k) ) h(ỹ ) + m y (k) =2 µ 2 y(k) =1 µ 2 y(k) =2 h(ỹ (k) ) h(ỹ ) + µ 2(m 1) ][ m 1 =1 y (k 1) [ m y y(k) ]. y (k 1) 2 y (k 1) 2 y (k) =2 ] y (k 1) h(ỹ (k) ) h(ỹ ) + 2(m 1) 3 M 2 L [h(ỹ (k) ) h(ỹ )] 2. 2 µ ] 2 (4.4) (4.5) Let a k = h(ỹ (k) ) h(ỹ ). Applyng Lemma 4.1 gves us our concluson. (For the µ frst formula, α = mn{ 2(m 1) 3 M 2 L, a 1, 0.75 a 2 }.) It s clear to see that condton (3) n Theorem 4.2 s satsfed when the level sets of h( ) are bounded. Condton (3) can be easly amended to havng all but one of the M for {1,..., m} beng fnte Convergence rate of extended Dykstra s algorthm. In Dykstra s algorthm, the functon f( ) n (1.1) s quadratc, and therefore ts gradent s lnear. Furthermore, each block f ( ) s strongly convex wth modulus 1. Thus condtons (1) and (2) of Theorem 4.2 are satsfed. We make some remarks condton (3) of Theorem 4.2. Remark 4.3. (Condton (3) of Theorem 4.2 for Dykstra s algorthm) As ponted out n [Han88], there may not exst a mnmzer ỹ of the dual problem (D ). Consder for example the problem of proectng onto the ntersecton of two crcles n R 2 ntersectng at only one pont. Furthermore, Gaffke and Mathar [GM89, Lemma 2] showed that for Dykstra s algorthm, f there s a λ > 2 such that x (k) m x 2 O(1/k λ ), then y = lm k y k exsts wth δ (y, C ) fnte, and ỹ = (y1,..., ym) mnmzng the functon h( ) of (2.1). Ths result can somewhat be seen as a converse of Theorem 4.2. Remark 4.4. (Fnteness of the M s) In our analyss of Dykstra s algorthm, suppose all but one of the M s n Theorem 4.2(3) are fnte for {1,..., m}. The Boyle- Dykstra theorem mples that the lmt lm [d y(k) 1 y m (k) ] = lm k k x(k) m exsts. Ths would mply that all the M s are fnte. We now provde the addtonal detals to show that Algorthm 3.1 (the extended Dykstra s algorthm) also converges at an O(1/k) rate.

12 SHQP STRATEGY FOR THE BAP 12 Theorem 4.5. (Convergence rate of extended Dykstra s algorthm) Consder Algorthm 3.1. Recall the defnton of h( ) n (2.1). Suppose the followng holds: (3 ) A mnmzer ỹ = (y1, y 2,..., y m) of h( ) exsts. Moreover, M defned by M = sup{ y (k) y : k 0} s fnte for all {1,..., m + 1}. (Compare ths to condton (3) of Theorem 4.2.) Recall the defnton of h k ( ) n (3.1). Then the sequence {h k (y (k) 1,..., y(k) m+1 )} k, converges to h(ỹ ) at a rate of O(1/k). Proof. We hghlght the dfferences ths proof has wth that of Theorem 4.2. Theorem 3.4(1) shows that h k 1 (ỹ (k 1) ) h k (ỹ (k) ) m+1 y (k) =1 y (k 1) 2, whch plays the role of (4.2). Next, f ỹ = (y 1,..., y m) s a mnmzer of h( ), then (y 1,..., y m, 0) s a mnmzer of hk ( ) for all k. Moreover, h k (y 1,..., y m, 0) = h(ỹ ). Next, we can prove an analogous result to (4.3) wth L = 1. The analogous result to (4.4) s [ m+1 ] h k (ỹ (k) ) h(ỹ ) mml y (k 1). (4.6) The analogous result to (4.5) s =2 y (k) h k 1 (ỹ (k 1) ) h(ỹ ) h k (ỹ (k) ) h(ỹ µ ) + 2m 3 M 2 L [h k (ỹ (k) ) h(ỹ )] 2. 2 (4.7) The concluson follows wth steps smlar to the proof of Theorem 4.2. An ndcator of whether an O(1/k) convergence rate s acheved would be whether condton (3) n Theorem 4.2 s satsfed. The next result gves suffcent condtons. Theorem 4.6. (Condton for bounded dual terates) Suppose X = R n, and consder the BAP (1.1). (1) Suppose at the prmal optmal soluton x = P C (d), we have m v = 0 and v N C (x ) for all {1,..., m} =1 (4.8) mples v = 0 for all {1,..., m}. Then the terates {ỹ (k) } of Dykstra s algorthm are bounded. Moreover, an accumulaton pont exsts, and s an optmal soluton for (D ), so condton (3) of Theorem 4.2 holds. (2) Suppose at the prmal optmal soluton x = P C (d), we have v = 0, v m+1 N C (x ) and v N C (x ) for all {1,..., m} m+1 =1 mples v = 0 for all {1,..., m + 1}. (4.9) Then the terates {ỹ (k) } of the extended Dykstra s algorthm are bounded. Moreover, an accumulaton pont exsts, and s a mnmzer of h : (R n ) m+1 R defned n (3.1), so condton (3 ) of Theorem 4.5 holds. (3) Suppose N C (x ) does not contan a lne for all {1,..., m}. In other words, the cones N C (x ) are ponted for all. Then (4.8) and (4.9) are equvalent.

13 SHQP STRATEGY FOR THE BAP 13 Proof. For (1), we prove the boundedness of the terates for Dykstra s algorthm. The other parts of the result are straghtforward. Seekng a contradcton, suppose the terates {ỹ (k) } are not bounded. Then m y (k) = d x (k) m 1 max y (k) =1 m =1 y (k) = 1 [d max y (k) x(k) m ]. (4.10) By the convergence of Dykstra s algorthm, lm k [d x (k) m ] exsts. Moreover, lm sup k max y (k) =, so by takng a subsequence f necessary (we do not relabel), the lmt of the RHS of (4.10) s zero. Let ŷ (k) y = (k) have m ŷ (k) =1 = 0. max y (k). We thus The sequence {(ŷ (k) 1,..., ŷ(k) m )} k has a convergent subsequence. Let an accumulaton pont be (ŷ1,..., ŷm). Note that ŷ (k) N C (x (k) ), so ŷ N C (x ). But not all the ŷ are zero. Ths gves us the contradcton to (4.8). We now show how to amend the proof of (1) to prove (2). For the extended Dykstra s algorthm, we can obtan the formula max y (k) m+1 1 max y (k) =1 y (k) = 1 max y (k) [d x(k) m+1 ], whch s smlar to (4.10). The sequence {(ŷ (k) 1,..., ŷ(k) m+1 )} k s defned smlarly by ŷ (k) y = (k), and has a convergent subsequence wth accumulaton pont (ŷ1,..., ŷ m ). For any c C, we have As we take lmts, we have ŷ (k) m+1, c x(k) m+1 0. ŷ m+1, c x 0, so ŷm+1 N C (x ). The same steps would mply that (4.9) s volated, hence a contradcton. Lastly, we prove (3). It s obvous that (4.9) mples (4.8) (ust take the partcular case when v m+1 = 0). We now prove that (4.8) mples (4.9). If (4.8) holds, then the formula for ntersecton of normal cones of convex sets (see [RW98, Theorem 6.42]) mples that N C (x ) = m N C (x ). =1 Suppose m+1 =1 v = 0, where v m+1 N C (x ) and v N C (x ) for all {1,..., m}. We can wrte v m+1 = m+1 =1 ṽ, where ṽ N C (x ) for all {1,..., m}. Then m =1 (v + ṽ ) = 0, and (v + ṽ ) N C (x ). Condton (4.8) would mply that v + ṽ = 0 for all {1,..., m}. Snce N C (x ) contans no lnes for all {1,..., m}, we have v = ṽ = 0 for all {1,..., m}. Ths mples that (4.9) holds. Remark 4.7. We make a few remarks on Theorems 4.6 and 4.2.

14 SHQP STRATEGY FOR THE BAP 14 (1) A smple example of a lne and a halfspace shows that (4.8) and (4.9) cannot be equvalent f the condtons n (3) were omtted. Even so, we can check that n ths smple example, the extended Dykstra s algorthm should perform better than the Dykstra s algorthm n general, even when (4.9) fals. See Fgure 4.1. (2) Even f condton (1) n Theorem 4.6 s not satsfed, condton (3) of Theorem 4.2 can hold. For example, consder the case of two (one dmensonal) lnes ntersectng only at the orgn n R 3. (3) The condton (4.8) s well known to be equvalent to the stablty of the sets {C } m =1 under perturbatons. See [Kru06] for example. Condton (4.8) s also mportant for establshng lnear convergence of the method of alternatng proectons for convex sets. See [BB96]. C 1 C 1 C 2 d C 2 d Fgure 4.1. In the dagram on the left, the lne shows the path Dykstra s algorthm takes. But for both the extended Dykstra s algorthm n Algorthm 3.1 (even f (4.9) s not satsfed) and Algorthm 4.8, we have convergence to P C (d) n a small number of steps. The dagram on the rght shows that Algorthms 3.1 and 4.8 are also advantageous for nonpolyhedral problems SHQP strategy for Dykstra s algorthm. We now show that n the case where a mnmzer exsts for h( ) as defned n (2.1), the SHQP strategy can be ncorporated nto Dykstra s algorthm. We present the followng addtonal step. Algorthm 4.8. (SHQP strategy for Dykstra s algorthm) Consder the orgnal warmstart Dykstra s algorthm (Algorthm 2.1). Between lnes 7 and 8, we can add as many copes of the followng code segment as needed. 01 Choose J {1,..., m} 02 Update y (k) 1,..., y(k) m by solvng the followng optmzaton problem (y (k) 1,..., y(k) m ) arg mn y 1,...,y m s.t. f(y y m ) + m δ (y, H ) (4.11) y = y (k) f / J. To llustrate the effectveness of the step n Algorthm 4.8, let us for now assume that J = {1,..., m}. Let y (k), 1,..., y m (k), X be the values of y (k) before lne 2 was performed n Algorthm 4.8, and let y (k),+ 1,..., y m (k),+ be the respectve values after lne 2 was performed. Note that n Dykstra s algorthm, lne 5 (x (k) = P C (z (k) )) s obtaned by proectng onto the set C, and ths proecton produces a supportng halfspace H at x (k) so that H C. Moreover, the halfspace H also satsfes =1 δ (y (k),, C ) = δ (y (k),, H ). (4.12)

15 SHQP STRATEGY FOR THE BAP 15 The ntersecton m =1 H would be a polyhedral outer approxmate of C = m =1 C. Snce H C, we have δ (, H ) δ (, C ). We therefore have y m (k), ) + m δ (y (k),, C ) f(y (k), =1 (4.12) = f(y (k), y m (k), ) + m δ (y (k),, H ) (4.11) f(y (k),+ = y m (k),+ ) + m δ (y (k),+, H ) f(y (k),+ = y m (k),+ ) + m δ (y (k),+, C ). Thus performng the step n lne 2 of Algorthm 4.8 mproves the dual obectve h( ). Note that the mnmzaton problem (4.11) s the dual of the problem of proectng a pont onto the polyhedron J H, whch can be solved effectvely by quadratc programmng f the number of halfspaces s small. If the number of halfspaces s large, then lne 1 of Algorthm 4.8 gves the flexblty of solvng a quadratc program of manageable sze nstead. In general, H can be chosen to be the ntersecton of halfspaces such that (4.12) s vald. If the boundary of C s smooth, then H approxmates C at x (k), and the algorthm reduces to sequental quadratc programmng. Ths gves a reason why the addtonal step n Algorthm 4.8 can be effectve n practce. The step explaned here gves a smlar knd of enhancement to what we saw earler for the extended Dykstra s algorthm. It s clear to see that the recurrence (4.1) s not affected by the addtonal step n Algorthm 4.8. Thus the convergence analyss gven n Subsecton 4.1 remans vald. But when h( ) does not have a mnmzer, we were not able to extend the Boyle-Dykstra Theorem (specfcally, Lemma A.4 below) for the proof of global convergence of the extenson of Dykstra s algorthm usng Algorthm 4.8. =1 5. Smultaneous Dykstra s algorthm Recall that Dykstra s algorthm reduces the best approxmaton problem to a seres of proectons. A varant of Dykstra s algorthm whch s more sutable for parallel computatons s the smultaneous Dykstra s algorthm proposed and studed n [IP91]. In ths secton, we gve some detals on dervng the smultaneous Dykstra s algorthm, and then show how the prncples descrbed n extendng Dykstra s algorthm can be appled for the smultaneous Dykstra s algorthm. Consder the BAP (1.1), where we want to fnd the proecton of d onto C = m =1 C. We now recall the product space formulaton of [Pe84]. Defne C X m and D X m by C := C 1 C m (5.1) and D := {(x,..., x) X m : x X}. Let λ 1,..., λ m be m postve numbers that sum to one, and let the nner product, Q n X m be defned by (u 1,..., u m ), (v 1,... v m ) Q := m λ u, v. =1

16 SHQP STRATEGY FOR THE BAP 16 The proecton of the pont (d,..., d) X m onto C D can easly be seen to be (P C (d),..., P C (d)). Dykstra s algorthm can be appled onto the product space formulaton. Ths gves the smultaneous Dykstra s algorthm proposed and studed n [IP91], whch we present below. Algorthm 5.1. [IP91](Smultaneous Dykstra s algorthm) Consder the BAP (1.1). Let y (0) X be the startng dual varables for each component {1,..., m}. Set x (0) = d m =1 λ y (0). 01 For k = 1, 2, For = 1, 2,..., m (Parallel proecton) 03 z (k) := x (k 1) + y (k 1) 04 x (k) = P C (z (k) ) 05 y (k) = z (k) x (k) 06 end for 07 x (k) = m =1 λ x (k) 08 end for We gve a bref explanaton of the smultaneous Dykstra s algorthm. Lnes 3 to 5 correspond to the proecton onto C. Lne 7 corresponds to proecton of (x (k) 1,..., x(k) m ) onto D,.e., (x (k),..., x (k) ) = P D (x (k) 1,..., x(k) m ). The advantage of the smultaneous Dykstra s algorthm s that lnes 3 to 5 can be performed n parallel. We now dscuss the convergence rate of the smultaneous Dykstra s algorthm. We saw n Secton 4 that the regularty condton (4.8) s a suffcent condton for O(1/k) convergence. We now show that ths regularty condton holds for the orgnal problem f and only f t holds for the product space formulaton. Proposton 5.2. (Equvalence of constrant qualfcaton) Let C be closed convex sets for {1,..., m}, and let C = m =1 C. Let C and D be as defned n (5.1). At a pont x C, the condtons and m v = 0 and v N C (x ) for all {1,..., m} =1 mples v = 0 for all {1,..., m} (5.2) (v 1,..., v m ) + (w 1,..., w m ) = 0, (v 1,..., v m ) N C (x,..., x ) (5.3) and (w 1,..., w m ) N D (x,..., x ) mples (v 1,..., v m ) = (w 1,..., w m ) = 0 are equvalent. Proof. Note that (v 1,..., v m ) N C (x,..., x ) f and only f v N C (x ) for all. Next, snce D s a lnear subspace, we have (w 1,..., w m ) N D (x,..., x ) f and only f (w 1,..., w m ) D. Proposton 5.3 gves the equvalent condton λ w = 0. So n other words, (v 1,..., v m ) + (w 1,..., w m ) = 0, (v 1,..., v m ) N C (x,..., x ) and (w 1,..., w m ) N D (x,..., x )

17 SHQP STRATEGY FOR THE BAP 17 s equvalent to v N C (x ) and w = v for all {1,..., m}, and m λ v = 0. Condtons (5.2) and (5.3) are now easly seen to be equvalent. As s well known n the study of Dykstra s algorthm, no correcton vectors for D are necessary snce D s an affne space. But we need to elaborate on the correcton vector to D before we show the dervaton of x (0). Let ths correcton vector be w (k) = (w (k) 1,..., w(k) m ). We have w (k) N D (x (k),..., x (k) ). But snce D s a lnear subspace, we have w (k) D. We have the followng easy result. Proposton 5.3. Let w = (w 1,..., w m ) be a vector n X m. Then w D f and only f λ w = 0. Proof. Ths follows easly from the followng chan: w D w, v = 0 for all v D λ w, v = 0 for all v X λ w = 0. Let ỹ (k) = (y (k) 1,..., y(k) m ). The default startng vector for the smultaneous Dykstra s algorthm n [IP91] s ỹ (0) = 0 X m, but we can warmstart Dykstra s algorthm as explaned n Subsecton 2.1. We now show that x (0) = d λ y (0) s ndeed the formula to warmstart the smultaneous Dykstra s algorthm. Proposton 5.4. (Formula for x (0) ) In Algorthm 5.1, for the startng dual vector ỹ (k) = (y (k) 1,..., y(k) m ) X m, the startng terate for x (0) s x (0) = d λ y (0). Proof. Let w (k) = (w (k) 1,..., w(k) m ) be the correcton vector correspondng to D. The terates (x (k),..., x (k) ) X m le n D for all k, and (d,..., d) D. From our study of Dykstra s algorthm earler, we have (w (k) 1,..., w(k) m ) = (d,..., d) (x(k),..., x (k) ) (y (k) 1,..., y(k) m ). Moreover, we have λ w (k) = 0 from Proposton 5.3, so λ (d x (k) y (k) ) = 0. Together wth the fact that λ = 1, we get the needed formula for x (0). We now look at how to mprove Algorthm 5.1. Lne 7 can be mproved by proectng (x (k) 1,..., x(k) m ) onto a set better than D. Recall that lne 4 produces supportng halfspaces of the set C. Consder the set Cm+1 k defned as the ntersecton of the supportng halfspaces produced n lne 4, and let C k X m be defned by C k = Cm+1 k Ck m+1 (m copes). We can add the set Ck to play the role of Cm+1 k n the extended Dykstra s algorthm (Algorthm 3.1) to enhance the algorthm A two-level Dykstra s algorthm. If we want to apply the SHQP strategy to enhance the smultaneous Dykstra s algorthm, then we mght want to cut up the problem nto smaller blocks so that the quadratc programs formed are defned by a manageable number of halfspaces. It s reasonable to assume that nformaton about the sets C communcate upwards from the leaves to the root of a tree (n the sense of graph theory). We llustrate wth an example wth m = 4 where we =1

18 SHQP STRATEGY FOR THE BAP 18 break down the sze of the quadratc programs to be at most 2. Let the sets D 1 and D 2 be defned by D 1 = {(x 1, x 2, x 3, x 4 ) X 4 : x 1 = x 2 } and D 2 = {(x 1, x 2, x 3, x 4 ) X 4 : x 3 = x 4 }. We present a two level Dykstra s algorthm. Algorthm 5.5. (Two level Dykstra s algorthm) Consder the BAP (1.1) where m = 4. Let y (0) X be the startng dual varables for each component {1,..., 4}. Set x (0) = d 4 =1 λ y (0). 01 For k = 1, 2, For {1, 2, 3, 4} 03 z (k) := x (k 1) + y (k 1) = P C (z (k) ) = z (k) x (k) 06 end for 07 x (k) (1,2) = λ1 λ 1+λ 2 x (k) 1 + λ2 λ 1+λ 2 x (k) 2 04 x (k) 05 y (k) 08 x (k) (3,4) = λ3 λ 3+λ 4 x (k) 3 + λ4 λ 3+λ 4 x (k) 4 09 x (k) = (λ 1 + λ 2 )x (k) (1,2) + (λ 3 + λ 4 )x (k) (3,4) 10 end for Lnes 2 to 6 descrbe the operaton nvolved n proectng onto C, whch s not dfferent from the smultaneous Dykstra s algorthm (Algorthm 5.1). Lne 7 descrbes the operaton n proectng onto D 1, lne 8 descrbes the operaton n proectng onto D 2, and lne 9 descrbes the operaton n proectng onto D. The agent that collects nformaton on x (k) 1 and x (k) 2 to obtan x (k) (1,2) can also collect the halfspaces generated by the proecton operaton used to obtan x (k) 1 and x (k) 2. We can make use of these halfspaces to form a superset of C that plays the role of Cm+1 k n the extended Dykstra s algorthm (Algorthm 3.1). In other words, the operatons n lnes 8 to 12 of Algorthm 3.1 can be nserted between lnes 7 and 8 of Algorthm 5.5. We can also nsert these same lnes between lnes 8 and 9 and between lnes 9 and 10 to enhance Algorthm 5.5. It s now easy to extend the prncples hghlghted here for problems nvolvng m > 4 sets and wth more than 2 levels. 6. Usng the APG for (D ) In ths secton, we depart from the dual alternatng mnmzaton strategy treated n the rest of the paper, and dscuss usng the accelerated proxmal gradent (APG) algorthm to solve (D ) n (2.1) n order to get a O(1/k 2 ) convergence rate. We remark that the APG can be augmented by the strategy descrbed n Subsecton 4.3. We recall the APG as presented n [Tse08, Secton 3], whch traces ts roots to Nesterov s second optmal method [Nes88]. We decde that t s best to adopt the notaton of [Tse08] even though t conflcts wth some of the notaton we have used n the rest of the paper.

19 SHQP STRATEGY FOR THE BAP 19 Algorthm 6.1. [Tse08, Algorthm 1] Consder the problem of mnmzng h(x) = f(x) + P (x), where f : X R s a smooth convex functon whose gradent f : X X s Lpschtz wth constant L, and P : X R s a (not necessarly smooth) convex functon. For each y X, defne l f ( ; y) : X R (a lnearzaton of h( ) at y) by l f (x; y) = f(y) + f(y), x y + P (x). Choose θ 0 (0, 1], x 0, z 0 dom(p ). k 0. Go to 1. (1) Choose a nonempty closed convex set X k X wth X k dom(p ). Let y k = (1 θ k )x k + θ k z k, (6.1) z k+1 = arg mn x Xk {l f (x; y k ) + θ kl 2 x z k 2 }, (6.2) ˆx k+1 = (1 θ k )x k + θ k z k+1. (6.3) Choose x k+1 such that h(x k+1 ) l f (ˆx k+1 ; y k ) + L 2 ˆx k+1 y k 2. (6.4) Choose θ k+1 (0, 1] satsfyng k k + 1, and go to 1. 1 θ k+1 θ 2 k+1 1. θ (6.5) 2 k The followng s the convergence result of Algorthm 6.1. We smplfy ther result by takng X k = X for all k. Theorem 6.2. [Tse08, Corollary 1(a)] Let {(x k, y k, z k, θ k, X k )} be generated by Algorthm 6.1 wth θ 0 = 1. Fx any ǫ > 0. Suppose θ k 2 k+2 (whch s the case when θ 0 = 1 and θ k+1 s determned from θ k by settng (6.5) to an equaton), and X k = X for all k. Then for any x dom(p ) wth h(x) nf(h) + ǫ, we have mn {h(x )} h(x) + ǫ whenever 4L k =0,1,...,k+1 ǫ x z 0 2. Even though the lne (6.4) s dfferent from that n [Tse08, (14)], t s easy to check that the nequalty [Tse08, (23)] remans vald wth ths change. Theorem 6.2 shows that the nfmum of {h(x k )} k produced by Algorthm 6.1 would converge to the nfmum of h( ). Furthermore, f a mnmzer of h( ) exsts, then the convergence rate of {h(x k ) nf(h)} k s of O(1/k 2 ). For the BAP (1.1) of proectng a pont onto the ntersecton of m sets, the functon h( ) was descrbed n (2.1). For ỹ X m, the mappng has Hessan ỹ d y 2 = f(y y m ) (6.6) I I I I I I..... I I I (.e., there are m 2 blocks n an m m block square matrx), and the gradent of the map n (6.6) s Lpschtz wth constant L = m. The step (6.2) can now be easly carred out usng Proposton 2.2(5) to obtan all m components of the mnmzer

20 SHQP STRATEGY FOR THE BAP 20 z +1. We can use the strategy descrbed n Subsecton 4.3 to get a better terate x k+1 satsfyng (6.4) than ˆx k Concluson In ths paper, we showed ways to ncorporate the SHQP heurstc to mprove Dykstra s algorthm. For the case when C are hyperplanes, the numercal experments n [Pan15a] shows the effectveness of the strateges explaned n ths paper. We defer further numercal experments to future work. Appendx A. Proof of convergence of Algorthm 3.1 In ths appendx, we present the proof of convergence of Algorthm 3.1, the extended Dykstra s algorthm. We already saw that f Hm+1 k = Ck m+1 = X for all k 0, then Algorthm 3.1 reduces to the orgnal Dykstra s algorthm. Apart from Theorem 3.4, our proof s mostly the same as the Boyle-Dykstra theorem [BD85] as presented n [Deu01b]. Note that the proof here also ncludes the warmstart case. Throughout ths secton, we follow the notaton of Algorthm 3.1. We need to follow the notaton n [Deu01b] and defne the sequences {e } = m and { x } =0 by e (m+1)(k 1)+ = y (k) (A.1) x (m+1)(k 1)+ = x (k). (A.2) The statement of Lemma A.6 makes the new notaton more natural. We denote [] to be the nteger n {1,..., m + 1} such that m + 1 dvdes []. Lemma A.1. In Algorthm 3.1, for each 1, such that [] {1,..., m}. Furthermore, f [] = m + 1, then δ (e, C [] y) = x y, e 0 for all y C []. (A.3) δ (e, C /(m+1) m+1 y) = x y, e 0 for all y C. (A.4) Proof. The proof of nequalty (A.3) s exactly the same as [Deu01b, Lemma 9.17], but our statement s now only vald for all n 1. We have x y, e = P C[] ( x 1 + e (m+1) ) y, x 1 + e (m+1) P K[] ( x 1 + e (m+1) ) 0, where the nequalty s an mmedate consequence from the propertes of proectons. The second nequalty n (A.4) s also clear. The equatons n both (A.3) and (A.4) are straghtforward from the defnton of δ (, ). Lemma A.2. In Algorthm 3.1, for each 0, d x = e m + e (m 1) + + e 1 + e. (A.5) Proof. Ths s easly seen from lnes 4 and 9 of Algorthm 3.1 and the formula for z (k) n (2.5). Lemma A.3. In Algorthm 3.1, { x } s a bounded sequence, and x 1 x 2 <. In partcular, =1 x 1 x 0 as. (A.6) (A.7)

21 SHQP STRATEGY FOR THE BAP 21 Proof. Formula (A.6) s ust a rephrasng of Theorem 3.4(2). Formula (A.7) follows easly. We now show the boundedness of { x }. For, let k = m+1. Defne v as v := 1 2 x P C (d) 2 + l= m Recall the defnton of h k ( ) n (3.1). We have v = 1 2 x P C (d) 2 + l= m e l, x l P C (d) e l, x l P C (d). (A.8) = 1 k(m+1) 2 x(k) k(m+1) P C(d) 2 + δ (y (k) l, C [l] P C (d)) +δ (y (k 1) m+1, Ck 1 m+1 P C(d)) + l=1 m l= k(m+1)+1 = h k 1 (y (k) 1, y(k) 2,..., y(k) k(m+1), y(k 1) k(m+1)+1,..., y(k 1) m+1 ) d, P C (d) P C(d) 2. δ (y (k 1) l, C [l] P C (d)) The proof of Theorem 3.4 shows that v s non-ncreasng.snce 0 Cm+1 k 1 P C(d) and 0 C [l] P C (d), we have v 1 2 x P C (d) 2 (ust lke n Proposton 2.2(3)), whch shows that { x } s a bounded sequence. Lemma A.4. In Algorthm 3.1, for any N, e x k 1 x k + k=1 max 1 l m+1 e l (m+1). (A.9) Proof. The proof s adusted from [Deu01b, Lemma 9.21]. We nduct on. It s clear to see that (A.9) holds for all { m,..., 0}. Suppose (A.9) holds for all r. Let M 1 = max 1 l m+1 e l (m+1). Then e +1 = x x +1 + e +1 (m+1) x x +1 + e +1 (m+1) +1 (m+1) x x +1 + x k 1 x k + M 1 +1 x k 1 x k + M 1, k=1 k=1 whch mples that (A.9) holds for r = + 1. Lemma A.5. In Algorthm 3.1, lm nf x k x, e k = 0. (A.10) k= m

22 SHQP STRATEGY FOR THE BAP 22 Proof. The proof needs to be adusted from [Deu01b, Lemma 9.22]. Let M 1 = max 1 l m+1 e l (m+1). Usng Schwarz s nequalty and Lemma A.4, we get k= m x k x, e k e k x k x k= m [( ) ] M 1 + k x 1 x x k x k= m =1 [( ) ( )] k x 1 x x l 1 x l + M 1 x k x k= m =1 l=k+1 k= m (m + 1) x 1 x x l 1 x l + M 1 x k x. =1 l= (m 1) } {{ } (1) k= m } {{ } (2) Term (2) converges to zero by Lemma A.3. Let a = x 1 x. To show our result, t suffces to show that [( ) ( )] lm nf a a l = 0 =1 l= (m 1) gven that =1 a2 s fnte. We refer the reader to the proof n [Deu01b, Lemma 9.22] for the proof of ths fact. Lemma A.6. In Algorthm 3.1, there exsts a subsequence { x } of { x } such that lm sup y x, d x 0 for each y C, and (A.11) lm k= m x k x, e k = 0. (A.12) Proof. The proof s almost exactly the same as [Deu01b, Lemma 9.23]. Lemma A.2, we have for all y C, m that y x, d x = y x, e m + e m e = y x, e k = k= m k= m y x k, e k + k= m By Lemma A.1, the frst sum s no more than 0. Hence y x, d x k= m x k x, e k. x k x, e k. Usng (A.13) By Lemma A.5, we deduce that there s a subsequence { } such that (A.12) holds. Note that the rght hand sde of (A.13) does not depend on y. In vew of (A.13), t follows that (A.11) also holds.

23 SHQP STRATEGY FOR THE BAP 23 Theorem A.7. (Warmstart Boyle-Dykstra Theorem) Consder Algorthm 3.1. Defne the sequence { x n } as n Step 2 of Algorthm 3.1 and (A.2). Then lm x P C (d) = 0. Proof. The proof of ths result s mostly the same as [Deu01b, Lemma 9.23]. By Lemma A.6, there exsts a subsequence { x } such that lm sup y x, d x 0 for each y C. (A.14) Snce { x } s bounded by Lemma A.3, t follows by [Deu01b, Theorem 9.12] (by passng to a further subsequence f necessary), that there s a y 0 X such that x w y0, (A.15) and lm x exsts. (A.16) By another property of Hlbert spaces ([Deu01b, Theorem 9.13]), y 0 lm nf x = lm x. (A.17) Snce [] takes on only m + 1 possbltes, an nfnte number of the s must be of the same value. If ths value s n {1,..., m}, say 0, then an nfnte number of the x s le n C 0. Snce C 0 s closed and convex, t s weakly closed by [Deu01b, Theorem 9.16], and hence y 0 C 0. By (A.7), x x 1 0. By a repeated applcaton of ths fact, we see that all the sequences { x +1}, { x +2},... converge weakly to y 0, and hence y 0 C for every. That s, y 0 C. For any y C, (A.17) and (A.14) mply that y y 0, d y 0 = y, d y, y 0 y 0, d + y 0 2 (A.18) lm [ y, d y, x x, d + x 2 ] = lm y x, d x 0. Hence y 0 = P C (d). Moreover, puttng y = y 0 n (A.18), we get equalty n the chan of nequaltes, and hence lm x 2 = y 0 2 (A.19) and lm y 0 x, d x = 0. By (A.15) and (A.19), t follows from [Deu01b, Theorem 9.10(2)] that x y 0 0. Hence x P C (d) = x y 0 0. (A.20)

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons