A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS

Size: px

Start display at page:

Download "A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS"

Abigail Powers
5 years ago
Views:

1 SIAM J. OPTIM. Vo. 18, No. 1, pp c 2007 Society for Industria and Appied Mathematics A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS MIKHAIL V. SOLODOV Abstract. We consider the bieve probem of minimizing a nonsmooth convex function over the set of minimizers of another nonsmooth convex function. Standard convex constrained optimization is a particuar case in this framework, corresponding to taking the ower eve function as a penaty of the feasibe set. We deveop an expicit bunde-type agorithm for soving the bieve probem, where each iteration consists of making one descent step for a weighted sum of the upper and ower eve functions, after which the weight can be updated immediatey. Convergence is shown under very mid assumptions. We note that in the case of standard constrained optimization, the method does not require iterative soution of any penaization subprobems not even approximatey and does not assume any reguarity of constraints (e.g., the Sater condition). We aso present some computationa experiments for minimizing a nonsmooth convex function over a set defined by inear compementarity constraints. Key words. bieve optimization, convex optimization, nonsmooth optimization, bunde methods, penaty methods AMS subject cassifications. 90C30, 65K05, 49D27 DOI / Introduction. We consider a cass of bieve probems of the form (1.1) minimize f 1 (x) subject to x S 2 = arg min{f 2 (x) x R n }, where f 1 : R n Rand f 2 : R n Rare convex functions, in genera nondifferentiabe. The above is a specia case of the mathematica program with generaized equation (or equiibrium) constraint [20, 7], which is minimize subject to f 1 (x) x {x R n 0 T (x)}, where T is a set-vaued mapping from R n to the subsets of R n. The bieve probem (1.1) is obtained by setting T (x) = f 2 (x), x R n. In the formuation of the probem considered here, there is ony one (decision) variabe x R n, and we are interested in identifying specific soutions of the incusion 0 T (x) (equivaenty, of the ower eve minimization probem in (1.1)); see [7]. Probems of the form of (1.1) are aso sometimes referred to as hierarchica optimization; see, e.g., [12, 4]. Note that, as a specia case, (1.1) contains the standard convex constrained optimization probem (1.2) minimize subject to f 1 (x) g i (x) 0, i =1,...,m, Received by the editors December 14, 2005; accepted for pubication (in revised form) October 24, 2006; pubished eectronicay Apri 3, This research is supported in part by CNPq grants /2005-4, /2005-2, and /2005-8, by PRONEX-Optimization, and by FAPERJ grant E-26/ / Instituto de Matemática Pura e Apicada, Estrada Dona Castorina 110, Jardim Botânico, Rio de Janeiro, RJ , Brazi (soodov@impa.br). 242

2 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 243 where g : R n R m is a (nonsmooth) convex function. Indeed, (1.2) is obtained from (1.1) by taking f 2 (x) =p(x), where p : R n R + is some penaty of the constraints, e.g., (1.3) m f 2 (x) =p(x) = max{0, g i (x)}. i=1 In this paper, we show that the bieve probem (1.1) can be soved by a propery designed (proxima) bunde method [15, 11, 3], iterativey appied to the parametrized famiy of functions (1.4) F σ (x) =σf 1 (x)+f 2 (x), σ > 0, where σ varies aong the iterations. Specificay, if x k R n is the current iterate and σ k > 0 is the current parameter, it is enough to make just one descent step for F σk from the point x k, after which the parameter σ k can be immediatey updated. We emphasize that at no iteration is the function F σk minimized to any prescribed precision. Once the descent condition is achieved, the parameter can be updated immediatey and we can start working with the new function F σk+1. For convergence of the resuting agorithm to the soution set of (1.1), parameters {σ k } shoud be chosen is such a way that (1.5) im k σ k =0, σ k =+. The requirement that σ k must tend to zero is natura and indispensabe, as can be seen from the case of standard optimization (1.2). To this end, it is interesting to comment on the reation between our method and the cassica penaty approximation scheme [8, 23]. The penaty scheme consists of soving a sequence of unconstrained subprobems k=0 (1.6) minimize F σ (x), x R n, where F σ is given by (1.4) with f 2 being a penaty term p, such as (1.3). (In the iterature, it is more common to minimize σ 1 F σ (x) = f 1 (x) +σ 1 p(x), but the resuting subprobem is ceary equivaent to (1.6).) As is we known, under mid assumptions optima paths of soutions x(σ) of penaized probems (1.6) tend to the soution set of (1.2) as σ 0. We emphasize that the requirement that penaty parameters shoud tend to zero is, in genera, indispensabe. To guarantee that a soution of (1.6) is a soution of the origina probem (1.2) for some fixed σ>0 (i.e., exactness of the penaty function), some reguarity assumptions on constraints are needed (e.g., see [3, section 14.4]). No assumptions of this type are made in this paper. The fundamenta issue is approximating x(σ k ) for some sequence of parameters σ k 0. It is cear that approximating x(σ k ) with precision is computationay impractica. It is therefore attractive to trace the optima path in a oose (and computationay cheap) manner, whie sti safeguarding convergence. In a sense, this is what our method does: instead of soving subprobems (1.6) to some prescribed accuracy, it makes just one descent step for F σk from the current iterate x k and immediatey updates the parameter. We emphasize that this resuts in meaningfu progress (and utimatey produces iterates converging to soutions of the probem) for arbitrary points x k, and not just for points cose to the optima path, i.e., points cose to x(σ k ).

3 244 MIKHAIL V. SOLODOV We therefore obtain an impementabe agorithm for tracing optima paths of penaty schemes. We next discuss the reationship of our agorithm to the existing iterature. For the bieve setting of (1.1), we beieve that our proposa is the first method which is competey expicit. In some ways, it is reated to [4], where a proxima point method for (1.1) has been considered, and (1.5) is referred to as sow contro. However, as any proxima method, the method of [4] is impicit: it requires soving nontrivia subprobems of minimizing reguarizations of functions F σk at every iteration, even if approximatey. By contrast, the method proposed in this paper is competey expicit: each iteration is a serious (or descent) step for the current F σk, constructed by a finite number of nu steps in a way which is essentiay standard in nonsmooth optimization. The specia case of standard optimization deserves some further comments. We next discuss bunde methods appicabe to probems with noninear constraints, such as (1.2) above. When the probem admits exact penaization, one can sove the equivaent unconstrained probem of minimizing the exact penaty function; see [14, 18]. However, as aready mentioned above, exact penaization requires reguarity assumptions on constraints, such as the Sater condition (existence of some x R n such that g i (x) < 0 for a i =1,...,m). We stress that no assumptions of this type are needed for our method. For exampe, our method is appicabe to minimizing a nonsmooth function subject to (monotone inear) compementarity constraints (Qx + q) i 0, x i 0, x i (Qx + q) i =0, i =1,...,n, where Q is an n n positive semidefinite matrix. Those constraints can be modeed in the form (1.2) as Qx q 0, x 0, Qx + q, x 0. Compementarity constraints do not satisfy constraint quaifications, no matter how they are modeed, which makes this cass of probems particuary difficut. We sha come back to probems with compementarity constraints in section 4, where some computationa experiments are presented. The methods in [21, 22] and [15, Chap. 5] do not use penaization but enforce feasibiity of every serious iteration. In particuar, they require a feasibe starting point, which is a difficut computationa task (in the case of noninear constraints). In addition, reguarity of constraints is sti needed for convergence. Bunde methods which do not use penaty functions and do not enforce feasibiity are [19, 9, 24, 13]. The methods in [24, 13] share one feature in common with the one proposed here: they appy bunde techniques to a dynamicay changing objective function, except that the function is different (underying [24, 13] is the so-caed improvement function, which goes back to [21, 15, 1]). The methods of [24, 13] require the Sater condition, whie those in [19, 9] do not. However, [19, 9] (as we as [21, 17, 18]) need a priori boundedness assumptions on the iterates to prove convergence. For our method, we assume ony that the soution set of the probem is bounded. For the standard optimization setting (1.2), this paper is aso somewhat reated to [5], where interior penaty schemes are couped with continuous-time steepest descent to produce a famiy of paths converging to a soution set. However, concrete numerica schemes in [5] arise from impicit discretization and, thus, resut in impicit proximapoint iterations, just as in [4]. Nevertheess, it was conjectured in [5] that an economic agorithm performing a singe iteration of some descent method for each vaue of σ k coud be enough to generate a sequence of iterates converging to a soution of the

4 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 245 probem. This is what the presented method does, athough we use exterior rather than interior penaties and consider the more genera nonsmooth setting (as we as the more genera bieve setting). A reated expicit descent scheme for the smooth case has been deveoped in [25]. Our notation is quite standard. By x, y we denote the inner product of x and y, and by the associated norm, where the space is aways cear from the context. For a convex function f : R n R, its ε-subdifferentia at the point x R n is denoted by ε f(x) ={g R n f(y) f(x) + g, y x ε for a y R n }, where ε R +. Then the subdifferentia of f at x is given by f(x) = 0 f(x). If S is a cosed convex set in R n, then P S (x) stands for the orthogona projection of the point x R n onto S, and dist(x, S) = x P S (x) is the distance from x to S. 2. The agorithm. As aready outined above, the conceptua idea of the agorithm is quite simpe. If x k R n is the current approximation to a soution of (1.1) and σ k > 0 is the current parameter defining the function F σk in (1.4), an iteration of the method consists of making a descent step for F σk reative to its vaue at x k. After this, the vaue of σ k can be changed immediatey. Since the function F σk is nonsmooth, the computationay impementabe way to construct a descent step is the bunde technique [15, 11, 3]. We next introduce the notation necessary for stating our agorithm. Bunde methods keep memory of the past in a bunde of information. Let x k be the current approximation to a soution and et y i,i=1,..., 1, be a the points that have been produced by the method so far, incuding the ones which have not been accepted as satisfactory (so-caed nu steps ). Generay, {x k } is a particuar subsequence of {y i }. For an iteration index, we sha denote by k() the index of the ast iteration preceding the iteration at which x k and σ k have been modified. Whenever k and appear in the same expression, we mean that k = k(). Let us denote the function and subgradient vaues of f 1 at the points y i, i = 1,..., 1, by f1 i = f 1 (y i ), g1 i f 1 (y i ), and simiary for f 2. Since (σ k g1 i + g2) i F σk (y i ), this information can be used to define a cutting-panes approximation Ψ of the function F σk, as foows: (2.1) Ψ (y) :=max i< { σk f i 1 + f i 2 + σ k g i 1 + g i 2,y y i } = σ k f 1 (x k )+f 2 (x k ) + max i< { (σ k e k,i 1 + e k,i 2 )+ σ kg i 1 + g i 2,y x k where the second expression is centered at x k and uses the inearization errors at y i with respect to x k : (2.2) e k,i p := f p (x k ) f i p g i p,x k y i 0, p =1, 2. We note that the second representation of Ψ in (2.1) is better suited for impementations, due to ower storage requirements. As is readiy seen from the definition of ε-subgradients, it hods that }, (2.3) and (2.4) g i p e k,i p f p(x k ), p =1, 2, (σ k g i 1 + g i 2) (σk e k,i 1 +ek,i 2 )F σ k (x k ).

5 246 MIKHAIL V. SOLODOV The inearization errors in (2.2) have to be propery updated every time x k changes. Choosing a proxima parameter μ > 0, we generate the next candidate point y by soving a quadratic programming (QP) reformuation of the probem (2.5) min y R n {Ψ (y)+ 12 μ y x k 2 }. We note that the resuting quadratic program possesses a certain specia structure, for which efficient software has been deveoped [16, 10]. The iterate y is considered good enough when F σk (y ) is sufficienty smaer than F σk (x k ) (the so-caed serious step ; this wi be made precise ater). If y is acceptabe, then we set x k+1 := y, choose new σ k+1, and proceed to construct a descent step for F σk+1. Otherwise, a socaed nu step is decared and the procedure continues for F σk, using the enhanced approximation Ψ +1. In order for the basic idea outined above to be practica, some important detais have to be incorporated into the design of the method, as discussed next. The number of constraints in the QP reformuation of (2.5) is precisey the number of eements in the bunde. Obviousy, one has to keep this number computationay manageabe. Thus, the bunde has to be compressed whenever the number of eements reaches some chosen bound. Reducing the bunde amounts to repacing the cuttingpanes mode (2.1) with another function, defined with a smaer number of cutting panes, which we sha sti denote by Ψ. This has to be done without impairing convergence of the agorithm. For this purpose, the so-caed aggregate function is fundamenta [3, Chap. 9], which we sha introduce in what foows. It is convenient to spit the information kept at iteration into two separate parts. One is the orace bunde containing subgradient vaues at (some of!) points y i, i =1,..., 1, and the associated inearization errors (reca (2.3) and (2.2)): B orace i< { (e k,i p ) } R +,gp i e f p(x k ),p=1, 2. k,i p Note that here, the bunde B orace is not required to contain information at a the previous points (this is refected by the use of the incusion, rather than equation, in the definition above). The other part is the aggregate bunde, obtained from soutions of the QP subprobems. This bunde contains certain specia ε-subgradients at x k, to be introduced in Lemma 2.1 beow. For now, we formay set B agg { ) (ˆε } k,i p R +, ĝp i k,i ˆε f p(x k ),p=1, 2, p i< without specifying how exacty those objects are obtained. Note that here there may no onger exist any previous point y i, i<, for which ĝp i f p (y i ), p =1, 2. The information in B orace and B agg defines a cutting-panes approximation of given by F σk (2.6) Ψ (y) =σ k f 1 (x k )+f 2 (x k ) { + max max i B orace max i B agg { } (σ k e k,i 1 + e k,i 2 )+ σ kg1 i + g2,y i x k, { (σ k ˆε k,i 1 +ˆε k,i 2 )+ σ kĝ1 i +ĝ2,y i x } } k,

6 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 247 where by i B orace we mean that there exists an eement in the set B orace indexed by i; and simiary for B agg. Athough this notation is formay improper (the bundes are not sets of indices), it does not ead to any confusion whie simpifying the formuas. We next discuss properties of the soution of QP subprobem (2.5) with Ψ given by (2.6). The foowing characterization is an adaptation of [3, Lemma 9.8] for our setting. Lemma 2.1. For the unique soution y of (2.5) with Ψ given by (2.6), it hods that (i) y = x k 1 μ (σ k ĝ1 +ĝ2); (ii) ĝp = i B λ orace i gi p + i B ˆλ agg iĝi p,p=1, 2, where λ 0, ˆλ 0 and i B λ orace i + i B ˆλ agg i =1; (iii) (σ k ĝ1 +ĝ2) Ψ (y ); (iv) ĝp k, ˆε f p(x k ), where ˆε k, p p = i B λ ie k,i orace p + i B ˆλ agg i ˆεk,i p,p=1, 2; (v) (σ k ĝ1 +ĝ2)=ĝ ˆε k,f σk (x k ), where ˆε k, = σ k ˆε k, 1 +ˆε k, 2 ; (vi) ˆε k, = F σk (x k ) Ψ (y ) 1 μ ĝ 2 0. Proof. The assertions can be verified foowing the anaysis in [3, Lemma 9.8], and taking into account the specia structure of the function F σk and of its approximation Ψ. We omit the detais. We note that λ and ˆλ in Lemma 2.1 are the Lagrange mutipiers associated with y in the quadratic program reformuation of (2.5) (or the probem variabes, if one soves the dua of this quadratic program, as in [16]). In any case, λ and ˆλ are avaiabe as part of the soution to (2.5). The quantities ĝp,ˆε k, p, p =1, 2, defined in Lemma 2.1 are precisey the ones that appear in the definition of B agg (except that B agg contains information computed at iterations previous to the th; at the first iteration we formay set B agg = ). We are now ready to introduce the aggregate function, aready mentioned above: k, (y) :=σ k f 1 (x k )+f 2 (x k ) (σ k ˆε k, 1 +ˆε k, 2 )+ σ kĝ1 +ĝ2,y x k, where (2.7) and consequenty, ĝ p ˆε k, p f p(x k ), p =1, 2, (2.8) (σ k ĝ1 +ĝ2) (σk ˆε k, )F 1 +ˆεk, σ k (x k ). 2 As aready noted above, this function is defined directy from the quantities avaiabe after soving (2.5). As pointed out in [6, eqs. (4.7) (4.9)], to guarantee that a bunde technique woud be abe to construct a descent step for F σk with respect to its vaue at x k (assuming x k is not a minimizer of F σk ) one can actuay use any cutting-panes modes Ψ satisfying (for a y R n ) the foowing three conditions: Ψ (y) F σk (y) for a 1 and a k, k, (y) Ψ +1 (y) for those for which y is a nu step, σ k f1 + f2 + σ k g1 + g2,y y Ψ +1 (y) for those for which y is a nu step. The ast two conditions mean that when defining the new bundes, it is enough for B+1 orace to contain the cutting pane computed at the new point y (i.e., the subgradients g1, g2, and the associated inearization errors e k, 1, ek, 2 ) and for Bagg +1 to contain

7 248 MIKHAIL V. SOLODOV the ast aggregate function k, (i.e., the ε-subgradients ĝ1,ĝ 2, and the associated ˆε k, 1, ˆε k, 2 ). In particuar, at any iteration, the bunde can contain as few eements as we wish (as ong as the two specified above are incuded). This fact is crucia for effective contro of the size of subprobems (2.5). Finay, to make sure that the first condition above hods for a k, the inearization and aggregate errors have to be propery updated every time x k changes to x k+1 (in particuar, to ensure the key reations (2.4) and (2.8)). As is readiy seen, the foowing formuas do the job: (2.9) e k+1,i p ˆε k+1,i p = e k,i p + f p (x k+1 ) f p (x k )+ g i p,x k x k+1, p =1, 2, for i B orace +1, =ˆε k,i p + f p (x k+1 ) f p (x k )+ ĝ i p,x k x k+1, p =1, 2, for i B agg +1. We are now ready to formay state the agorithm. Agorithm 2.1 (bieve bunde method). Step 0. Initiaization. Choose parameter m (0, 1) and an integer B max 2. Choose x 0 R n and σ 0 > 0, β 0 > 0. Set y 0 := x 0 and compute fp 0,gp, 0 p =1, 2. Set k =0, =1, e 0,0 p := 0, p =1, 2. Define the starting bundes B1 orace := {(e 0,0 p,gp,p=1, 0 2)} and B agg 1 :=. Step 1. QP subprobem. Choose μ > 0 and compute y as the soution of (2.5), where Ψ is defined by (2.6). Compute ĝ = μ (x k y ), ˆε k, = F σk (x k ) Ψ (y ) 1 μ ĝ 2, δ =ˆε k, + 1 2μ ĝ 2. Compute fp,g p, p =1, 2. Compute e k, p, p =1, 2, using (2.2) written with i =. Step 2. Descent test. If (2.10) F σk (y ) F σk (x k ) mδ, then decare a serious step. Otherwise, decare a nu step. Step 3. Bunde management. Set B+1 orace := B orace and B agg +1 maximum size (i.e., if B orace +1 B agg two eements from B+1 orace B agg +1, ĝp,p=1, 2) to B agg (ˆε,k p +1. := Bagg. If the bunde has reached the +1 = B max), then deete at east and append the aggregate information In any case, append (e k, p,gp,p=1, 2) to B+1 orace. Step 4. IfDescent test was satisfied, set x k+1 = y and choose 0 <σ k+1 σ k and 0 <β k+1 β k. Update the inearization and aggregate errors using (2.9). Set k = k +1 and go to Step 5. If (2.11) max{ˆε k,, ĝ } β k σ k, choose 0 <σ k+1 <σ k and 0 <β k+1 <β k. Set x k+1 = x k, k = k +1 and go to Step 5. Step 5. Set = +1 and go to Step 1.

8 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 249 The roe of checking condition (2.11) is to detect the situation when the point x k happens to be a minimizer of the function F σk (or is amost a minimizer; reca Lemma 2.1(v)). If it is so, we immediatey update the parameter σ k. This is reasonabe, since we are not interested in minimizing F σk. The case of x k being a minimizer of F σk, however, is very unikey to occur, since for no iteration k the function F σk is being minimized with any prescribed precision. This is aso confirmed by our numerica experiments in section 4, where we ignored the safeguard (2.11) in our impementation. The agorithm does not have an overa stopping test. In the unconstrained case, a reiabe stopping test is one of the important advantages of bunde methods (as compared, for exampe, to subgradient methods). However, ack of a stopping test in our setting cannot be considered to be a drawback of the agorithm. Indeed, a bieve probem does not admit an expicit optimaity condition. Actuay, the same is in genera aready true for constrained optimization without a reguarity assumption on the constraints (except for some specia cases, of course). As a resut, there is no expicit way to measure vioation/satisfaction of optimaity in (1.1), and, consequenty, ack of a stopping test is inherent in the nature of the probem. We note that there is certain freedom in updating or not updating the parameter σ k after every iteration. Whie our goa is to show that we can update it after a singe descent step, note that, in principe, we are not obiged to do so (σ k+1 = σ k is aowed, uness (2.11) hods; in the atter case, x k amost minimizes F σk and it does not make sense to insist on further descent for this function). For convergence, it woud be required that σ k not go to zero too fast, in the sense of condition (1.5) stated above. In the case of the standard optimization probem (1.2), this condition aows a natura interpretation. In order to be abe to trace the optima penaty path x(σ) with such a reaxed precision (making just one descent step for each penaized subprobem (1.6)), we shoud not be jumping too far from the target x(σ k ) on the path to the next target x(σ k+1 ) as we move aong. On the other hand, if σ k is kept constant over a few descent iterations, this aows for a more rapid change in the parameter for the next iteration, whie sti guaranteeing the second condition in (1.5). This is intuitivey reasonabe: if we get coser to the optima path, then the target can be moved further. In our numerica experiments in section 4, we have used the simpest generic choice of σ k = σ 0 /(k +1). We have experimented with some other options (for exampe, keeping the parameter unchanged for some iterations), but found that this does not make much difference (for our test probems). We sha discuss this further in section Convergence anaysis. In our convergence anaysis, we assume that the objective function f 1 is bounded beow; i.e., < f 1 = inf {f 1 (x) x R n }. Since we aso assume that the probem is sovabe, the function f 2 is automaticay bounded beow, and we define < f 2 = min {f 2 (x) x R n }. For the subsequent anaysis, it is convenient to think of Agorithm 2.1 as appied to the shifted function (3.1) F σ (x) =σ(f 1 (x) f 1 )+(f 2 (x) f 2 ), instead of the function F σ given by (1.4), as stated originay. We can do this because Agorithm 2.1 woud generate the same iterates whether F σ were given by (3.1) or

9 250 MIKHAIL V. SOLODOV (1.4). Indeed, the two functions have the same subgradients and the same difference for function vaues at any two points. Hence, the cutting-panes modes (2.6) for the two functions woud differ by a constant term (not dependent on y). This means that soutions y of QP subprobems (2.5) woud be the same, as we as the quantities ĝ and ˆε k,, which are defined by those soutions. Therefore, the reations in (2.10) and (2.11), which are guiding the agorithm, aso do not change. From now on, we consider that the method is appied to function F σ defined by (3.1) (even though the function from (1.4) is used in reaity, of course). This is convenient for the subsequent anaysis and shoud not ead to any confusion. We proceed to prove convergence of the agorithm. Proposition 3.1. Let f 1 and f 2 be convex functions. If for consecutive nu steps it hods that μ μ +1 μ > 0, then Agorithm 2.1 is we defined and either (2.10) or (2.11) (or both) hod infinitey often. In particuar, the parameter σ k is updated infinitey often. Proof. Let k be any iteration index and consider the sequence of nu steps appied to the current (fixed over those nu steps) function F σk. By properties of standard bunde methods (e.g., [3, Thm. 9.15]), it hods that either the descent test (2.10) is satisfied after a finite number of nu steps, or x k is a minimizer of F σk. In the atter case, it further hods that δ 0as. Hence, ĝ 0 and ˆε k, 0as. This means that the condition (2.11) woud be satisfied after a finite number of nu steps. We have therefore estabished that either (2.10) or (2.11) is guaranteed to be satisfied after a finite number of nu steps. This shows that the method is we defined and updates σ k infinitey often. We next prove that the generated sequence {x k } is bounded and its accumuation points are feasibe for probem (1.1). Proposition 3.2. Let f 1 and f 2 be convex functions such that f 1 is bounded beow on R n and the soution set S 1 of probem (1.1) is nonempty and bounded. Suppose that μ μ ˆμ >0 for a iterations, that μ +1 μ on consecutive nu steps, and that σ k 0 as k. Then any sequence {x k } generated by Agorithm 2.1 is bounded and a its accumuation points are feasibe for probem (1.1); i.e., they beong to S 2. Proof. If the serious step descent test (2.10) is satisfied ony a finite number of times, it is readiy seen that there exists some iteration index k 0 such that x k = x k0 for a k k 0 (because x k is changed ony at serious steps, i.e., when (2.10) hods). Hence, in this case {x k } is triviay bounded. Assume now that (2.10) is satisfied infinitey often. In what foows, we consider the subsequence of indices k at which (2.10) hods, i.e., at which x k changes. But to simpify the notation, we sha not introduce this subsequence expicity. Here, we can simpy disregard a the iterations at which x k remained fixed. We can do this within the current anaysis of boundedness of {x k }, because those iterations merey changed σ k (and the ony assumption for the atter used beow is that it shoud be nonincreasing the property which hods for any subsequence of {σ k } by the construction of the method). For each k, et (k) be the index for which (2.10) was satisfied (in particuar, x k+1 = y (k) ). By (2.10), it hods that mδ (k) F σk (x k ) F σk (x k+1 ) = σ k (f 1 (x k ) f 1 ) σ k (f 1 (x k+1 ) f 1 ) +(f 2 (x k ) f 2 ) (f 2 (x k+1 ) f 2 ).

10 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 251 Summing up the atter inequaities for k =0,...,k 1, we obtain that k 1 m δ (k) σ 0 (f 1 (x 0 ) f k )+ (σ k+1 σ k )(f 1 (x k+1 ) f 1 ) k=0 k=0 σ k1 (f 1 (x k1+1 ) f 1 )+(f 2 (x 0 ) f 2 ) (f 2 (x k1+1 ) f 2 ) σ 0 (f 1 (x 0 ) f 1 )+(f 2 (x 0 ) f 2 ), where we have used the facts that, for a k, f 1 (x k ) f 1, f 2 (x k ) f 2, and 0 <σ k+1 σ k. Letting k 1, we concude that (3.2) In particuar, δ (k) m 1 (σ 0 (f 1 (x 0 ) f 1 )+(f 2 (x 0 ) f 2 )) < +. k=0 (3.3) δ (k) 0 as k. Take any x S 1. Using Lemma 2.1(i), we obtain that x k+1 x 2 = x k x 2 2 ĝ (k),x k x + 1 ĝ (k) 2 μ (k) (3.4) x k x μ (k) = x k x μ (k) δ (k) μ 2 (k) ( F σk ( x) F σk (x k )+ˆε k,(k) + 1 2μ (k) ĝ (k) μ (k) ( σk (f 1 ( x) f 1 (x k )) + f 2 ( x) f 2 (x k ) ) x k x μ (k) δ (k) + 2σ k μ (k) ( f1 ( x) f 1 (x k ) ), where the first inequaity is by Lemma 2.1(v), and the ast is by the fact that f 2 ( x) f 2 (x k ), since x S 1 S 2. We next consider separatey the foowing two possibe cases: Case 1. There exists k 2 such that f 1 ( x) f 1 (x k ) for a k k 2. Case 2. For each k, there exists k 3 k such that f 1 ( x) >f 1 (x k3 ). Case 1. For k k 2, we obtain from (3.4) that ) (3.5) x k+1 x 2 x k x 2 + 2ˆμ δ (k). Recaing (3.2), we concude that { x k x 2 } converges (see, e.g., [23, Lem. 2, p. 44]). Hence, {x k } is bounded. Case 2. For each k, define i k = max{i k f 1 ( x) >f 1 (x i )}. In the case under consideration, it hods that i k when k. We first show that {x i k } is bounded. Observe that S 1 = {x S 2 f 1 (x) f 1 ( x)} = {x R n max{f 2 (x) f 2,f 1 (x) f 1 ( x)} 0}.

11 252 MIKHAIL V. SOLODOV By assumption, the set S 1 is nonempty and bounded. Therefore, the convex function φ : R n R, φ(x) = max{f 2 (x) f 2,f 1 (x) f 1 ( x)} has a particuar eve set {x R n φ(x) 0} which is nonempty and bounded. It foows that a eve sets of φ are bounded (see, e.g., [2, Prop ]), i.e., L φ (c) ={x R n φ(x) c} is bounded for any c R. Since f 1 (x) f 1 0 for a x R n and 0 <σ k+1 σ k, it hods that Hence, F σk+1 (x) F σk (x) for a x R n. 0 F σk+1 (x k+1 ) F σk (x k+1 ) F σk (x k ), where the third inequaity foows from (2.10). The above reations show that {F σk (x k )} is nonincreasing and bounded beow. Hence, it converges. It then easiy foows that {f 2 (x k ) f 2 } is bounded (because both terms in F σk (x k )=σ k (f 1 (x k ) f 1 )+(f 2 (x k ) f 2 ) are nonnegative). Fix any c 0 such that f 2 (x k ) f 2 c for a k. Since f 1 (x i k ) f 1 ( x) < 0 c (by the definition of the index i k ), we have that x i k L φ (c), which is a bounded set. This shows that {x i k } is bounded. By the definition of i k, it further hods that Hence, from (3.4), we have that f 1 ( x) f 1 (x i ), i = i k +1,...,k (if k>i k ). x i+1 x 2 x i x 2 + 2ˆμ δ (i), i = i k +1,...,k. Therefore, for any k, it hods that (3.6) x k x 2 x i k x 2 + 2ˆμ x i k x 2 + 2ˆμ k 1 i=i k +1 i=i k +1 δ (i) δ (i). Recaing that i k, by (3.2) we have that (3.7) i=i k +1 δ (i) 0 as k. Taking aso into account the boundedness of {x i k }, the reation (3.6) impies that the whoe sequence {x k } is bounded. We next show that a accumuation points of {x k } beong to S 2. For each k, either (2.10) or (2.11) hods. Regardess of whether both conditions hod infinitey often or ony one does, it is easy to see that (3.8) ĝ (k) 0 and ˆε k,(k) 0 as k,

12 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 253 where (3.3) is used if (2.10) hods infinitey often, and (2.11) is used directy. Let x R n be arbitrary but fixed. By Lemma 2.1(v), (3.9) σ k f 1 (x)+f 2 (x) σ k f 1 (x k )+f 2 (x k )+ ĝ (k),x x k ˆε k,(k). Let x be any accumuation point of {x k }. Using boundedness of {x k }, the continuity of f 1 and f 2, the fact that σ k 0 and (3.8), and passing onto the imit in (3.9) aong the subsequence which converges to x, we obtain that f 2 (x) f 2 (x ), where x R n is arbitrary. Hence, x S 2. The rest of the proof is done separatey for the foowing two cases: the number of serious steps when (2.10) is satisfied is either infinite or finite. Theorem 3.3. Let f 1 and f 2 be convex functions such that f 1 is bounded beow on R n and the soution set S 1 of probem (1.1) is nonempty and bounded. Suppose that μ μ ˆμ >0 for a iterations, and that μ +1 μ on consecutive nu steps. If serious step descent test (2.10) is satisfied an infinite number of times and we choose {σ k } according to (1.5) and {β k } 0 as k, then dist(x k,s 1 ) 0 as k, and a accumuation points of {x k } are soutions of (1.1). Proof. Take any x S 1. We again consider separatey the two possibe cases introduced in the proof of Proposition 3.2: Case 1. There exists k 2 such that f 1 ( x) f 1 (x k ) for a k k 2. Case 2. For each k, there exists k 3 k such that f 1 ( x) >f 1 (x k3 ). Case 2. Recaing that i k = max{i k f 1 ( x) >f 1 (x i )} so that f 1 (x i k ) <f 1 ( x), by the continuity of f 1 it hods that f 1 (x ) f 1 ( x) for any accumuation point x of {x i k }. Since a accumuation points of {x k } beong to S 2 (as estabished in Proposition 3.2), it must be the case that a accumuation points of {x i k } are soutions of the probem. In particuar, (3.10) dist(x i k,s 1 ) 0 as k. For each k, define x k = P S1 (x i k ). Using (3.6) with x = x k gives dist(x k,s 1 ) 2 x k x k 2 dist(x i k,s 1 ) 2 + 2ˆμ i=i k +1 δ (i). Passing onto the imit in the atter reation as k, and using (3.7) and (3.10), we obtain that dist(x k,s 1 ) 0. Case 1. As has been shown in Proposition 3.2, in this case the sequence { x k x } converges for any x S 1. Therefore, if we estabish that {x k } has an accumuation point x S 1, it woud immediatey foow that { x k x } 0; i.e., the whoe sequence {x k } converges to x S 1. Suppose first that (2.11) is satisfied ony a finite number of times. Suppose further that there is no accumuation point of {x k } which soves (1.1). Since, by Proposition 3.2, a accumuation points are feasibe for (1.1), the second assumption means that im inf k f 1 (x k ) >f 1 ( x), where x S 1. In particuar, there exists t>0 such that

13 254 MIKHAIL V. SOLODOV f 1 ( x) f 1 (x k ) t for a k k 4. We then obtain from (3.4) that for k>k 4, it hods that x k+1 x 2 x k x 2 + 2ˆμ δ (k) 2t μ σ k x k4 x 2 + 2ˆμ k i=k 4 1 δ (i) 2t μ k i=k 4 1 Passing onto the imit when k in the atter reation, we obtain 2t μ i=k 4 1 σ i x k4 x 2 + 2ˆμ i=k 4 1 δ (i), which is a contradiction, due to (3.2) and (1.5). Hence, im inf k f 1 (x k )=f 1 ( x). Since {x k } is bounded, it must have an accumuation point x such that f 1 (x )= f 1 ( x). As x S 2, this means that x S 1. Finay, suppose that (2.11) is satisfied an infinite number of times. Consider the subsequence of indices k for which (2.11) hods (we sha not specify it expicity) and et (k) denote the associated index in (2.11). We have that σ i. (3.11) max{σ 1 k ˆεk,(k),σ 1 k ĝ(k) } β k, β k 0. Taking any x S 2 and using Lemma 2.1(v), we have that σ k f 1 (x)+f 2 (x) σ k f 1 (x k )+f 2 (x k )+ ĝ (k),x x k ˆε k,(k). Since f 2 (x) f 2 (x k ) for any x S 2, we obtain (3.12) f 1 (x) f 1 (x k )+ σ 1 k ĝ(k),x x k σ 1 k ˆεk,(k). Hence, passing onto the imit in (3.12) as k aong some subsequence converging to x and taking into account (3.11), we concude that f 1 (x) f 1 (x ) for any x S 2. Therefore, x S 1 aso in this case, which concudes the proof. It remains to consider the case of a finite number of serious steps in Agorithm 2.1. As aready discussed above, this is rather unikey to occur. Actuay, as the next resut shows, it can happen ony if we hit an exact soution of the probem, which is generay an exceptiona situation. Theorem 3.4. Let f 1 and f 2 be convex functions such that f 1 is bounded beow on R n and the soution set S 1 of probem (1.1) is nonempty and bounded. Suppose that μ μ ˆμ >0 for a iterations, and that μ +1 μ on consecutive nu steps. If the serious step descent test (2.10) is satisfied a finite number of times and we choose {σ k } 0 and {β k } 0 as k, then there exists an iteration index k 0 such that x k = x k0 for a k k 0 and x k0 S 1. Proof. Since x k is changed ony when (2.10) hods, it is readiy seen that x k = x k0 for a k k 0. By Proposition 3.2, we have that x k0 S 2. By Proposition 3.1, we have that for a k k 0, σ k is updated when (2.11) hods. For each k, et (k) denote the index for which (2.11) is satisfied. We have that (3.13) max{σ 1 k ˆεk,(k),σ 1 k ĝ(k) } β k, β k 0.

14 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 255 Taking any x S 2 and using Lemma 2.1(v), we have that σ k f 1 (x)+f 2 (x) σ k f 1 (x k0 )+f 2 (x k0 )+ ĝ (k),x x k0 ˆε k,(k). Since f 2 (x) =f 2 (x k0 ) for any x S 2, we obtain (3.14) f 1 (x) f 1 (x k0 )+ σ 1 k ĝ(k),x x k0 σ 1 k ˆεk,(k). Hence, passing onto the imit in (3.14) as k and taking into account (3.13), we concude that f 1 (x) f 1 (x k0 ) for any x S 2. Therefore, x k0 S 1, as caimed. 4. Computationa experiments. In this section, we report on some numerica experiments for the probem of minimizing a piecewise quadratic convex function over a set defined by monotone inear compementarity constraints. Specificay, we consider the probem (4.1) minimize max j=1,..., { A j x, x + b j,x + c j } subject to Qx + q 0, x 0, x, Qx + q 0, where Q and A j, j =1,...,, are n n positive semidefinite matrices; q and b j, j =1,...,, are vectors in R n ; and c j R, j =1,...,. This probem is converted to the setting of the paper by choosing f 2 (x) = f 1 (x) = n max{ x i, 0} + i=1 max j=1,..., { Aj x, x + b j,x + c j }, n max{ (Qx + q) i, 0} + max{ Qx + q, x, 0}. i=1 The code is written in MATLAB, essentiay by making modifications to a more-or-ess standard unconstrained proxima bunde code. Runs are performed under MATLAB Version (R14). The test probems were constructed by first generating a feasibe point x of (4.1), and then a function f 1 for which x is optima. Detais are presented next. The process starts with defining an n n positive semidefinite matrix Q of rank r< n, whose entries are uniformy distributed in the interva [ 5, 5]. We next generate a point x, with each coordinate having equa probabiity of being zero or being uniformy distributed in [0, 5]. Finay, we define q = Q x +ȳ, where a coordinate of ȳ is zero if the corresponding coordinate of x is positive, whie other coordinates of ȳ have equa probabiity of being zero or uniformy generated from [0, 5]. As can be easiy seen, such x is a feasibe point for probem (4.1). It does not satisfy strict compementarity and, typicay, is not an isoated feasibe point (here, it is important that Q is a degenerate matrix). Obviousy, x is an unconstrained minimizer of the function f 2, i.e., x S 2. Next, we construct a function f 1 such that x is a minimizer of f 1 over S 2. As the constraints in (4.1) do not satisfy a constraint quaification, we can ony overestimate the tangent cone T S2 ( x) tos 2 at x, which gives underestimation of its dua: (4.2) (T S2 ( x)) K = cone ( { e i x i =0} { Q i ȳ i =0} {q +(Q + Q ) x} ), where e i is the ith eement of the canonica basis of R n, Q i is the ith row of the matrix Q, and cone(x) stands for the conic hu of the set X in R n.

15 256 MIKHAIL V. SOLODOV We sha construct the needed function f 1 by defining antigradients of pieces of f 1 active at x as some eements beonging to the right-hand side of (4.2). This woud guarantee the optimaity condition (4.3) 0 f 1 ( x)+(t S2 ( x)), even though the set (T S2 ( x)) is not fuy known. First, we generate symmetric n n positive semidefinite matrices A j, j =1,...,, with random entries distributed in [ 5, 5]. Choosing the number 0 of pieces of f 1 active at x, we next define b j = 2A j x u j, u j K, j =1,..., 0, where eements u j of K are generated by taking random coefficients in [0, 1] for a vectors in the right-hand side of (4.2). The eements b j, j = 0 +1,...,, are generated randomy. It remains to make sure that the first 0 pieces in the definition of f 1 are active at x. To this end, we compute and set c = 5 + max j=1,..., { Aj x, x + b j, x }, c j = c A j x, x b j, x, j =1,..., 0, c j =0, j = 0 +1,...,. It can be seen that for the point x, the maximum in the definition of f 1 is attained for indices j =1,..., 0, and that f 1 ( x) = c. By the previous constructions, we have that (4.3) hods, and thus x is a soution of (4.1). Furthermore, the optima vaue of this probem is c. Our code is a sighty simpified version of Agorithm 2.1, in particuar in the foowing two detais. First, instead of an aggregation technique to contro the bunde, we use simpe seection of active pieces; i.e., after every iteration we discard those cutting panes which correspond to zero mutipiers in the soution of the QP subprobem. Second, we ignore the safeguard (2.11) that detects when the current point x k is amost a minimizer of F σk, and so σ k needs to be reduced (even if a serious step has not yet been constructed). As aready discussed above, since at no iteration F σk is being minimized to any specific precision, this situation is unikey to occur prematurey if σ k is updated after each serious step. This intuition was confirmed by our experiments. We observed that optimaity is achieved ony asymptoticay, and so the standard bunde stopping test, (4.4) ˆε k, t 1 and ĝ 2 t 2, can be used without any harm. But, of course, one has to be aware that this stopping test cannot be fuy reiabe in our setting. In our experiments, we set t 1 =10 2 and t 2 =10 4, as it is often difficut to get more precision from a nondifferentiabe optimization code in a simpe MATLAB impementation. We start with x 0 =(2,...,2), and set m =10 1 in the descent test (2.10). The proxima parameter μ in (2.5) is changed at serious steps ony by the safeguarded version of the reversa quasi-newton scaar update; see [3, section 9.3.3]. More precisey, μ k+1 = min {c 1, max{ μ k+1,c 2 }},

16 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 257 Tabe 4.1 Summary of numerica experiments. Convergence (out of 20) Faiures (out of 20) n = 5 18 cases 38.3 orace cas 2 cases 100 orace cas rank Q =4 R 1 = R 2 = R 1 = R 2 = n = 5 19 cases 32.2 orace cas 1 case 100 orace cas rank Q =2 R 1 = R 2 = R 1 = R 2 = n = cases orace cas 8 cases 200 orace cas rank Q =8 R 1 = R 2 = R 1 = R 2 = n = cases 89.9 orace cas 6 cases 200 orace cas rank Q =5 R 1 = R 2 = R 1 = R 2 = n = cases 60.6 orace cas 4 cases 200 orace cas rank Q =2 R 1 = R 2 = R 1 = R 2 = where μ k+1 is the vaue prescribed by [3, section 9.3.3], and c 1 = 10, c 2 =10 1. Subprobems (2.5) are soved by appying the MATLAB QP routine qp.m to the dua formuation of (2.5). For updating the weight parameter, we use the simpe generic choice (4.5) σ k = σ 0 /(k +1). For ower dimensions (say, n = 5), when fewer iterations are expected, we start with σ 0 = 10. For higher dimensions (say, n = 10), when more iterations are typicay needed, we start with σ 0 = 20. We have experimented with other possibiities, ike keeping σ k fixed over some number of serious steps, as we as with some more invoved strategies. Whie improvements are possibe, at this time we did not find them significant enough, with respect to the simpe (4.5), to warrant their description. Generay, our experiments are intended for merey verifying that the proposed agorithm works and in a reasonabe way. We did not spend much time on tuning various parameters to obtain an efficient code. To achieve this, as a first step, one shoud dispense with the generic qp.m MATLAB QP sover, which is known to be probematic (and was observed to be a imitation for our experiments as we). Instead, some good speciaized sover (e.g., based on [16, 10]) has to be empoyed. Our resuts are summarized in Tabe 4.1. We report on probems of dimensions n = 5 and n = 10, with various degrees of degeneracy of matrix Q, i.e., for different vaues of rank Q = r<n. Note that the number of constraints in (4.1) is 2n +1. For each pair of n and r the resuts are averaged over 20 runs. For a the probems, = 5 and 0 = 3; i.e., f 1 is defined by a maximum of five quadratic functions, with three of them being active at x. We found that moderate variations of and 0 do not change much of the average behavior of the method. We thus keep them fixed in our report, to simpify the tabe. We report the number of times (out of 20 runs) that convergence had been decared according to the stopping rue (4.4), and the number of times this did not happen (decared as a faiure) after a maximum aowed number of cas to the orace (i.e., evauations of f 1, f 2, and of their subgradients). In the case of n = 5, the maxima number of orace cas is 100, and in the case of n = 10, it is 200. For both outcomes, we report the average number of orace cas at termination (which is redundant in the case of faiures) and the average of the reative accuracies achieved with respect to the optima vaue c of probem (4.1) and of the (in)feasibiity measure (the optima vaue of f 1, which is zero). Specificay, in Tabe 4.1, we denote R 1 = (f 1 (x k ) c)/(f 1 (x 0 ) c), R 2 = f 2 (x k )/f 2 (x 0 ), where x k is the ast serious iterate before termination.

17 258 MIKHAIL V. SOLODOV We note that even in the cases of faiure the method actuay makes reasonabe progress to the soution of the probem, as evidenced by the vaues of R 1 and R 2 in Tabe 4.1. We beieve that a more carefu impementation, incuding a better QP sover, shoud improve the accuracy (especiay in higher dimensions) and eiminate faiures of nonsatisfaction of the stopping rue (4.4). To this end, we observed that in most cases, the vaues of R 1 and R 2 (which measure actua proximity to soution) are very satisfactory, and cose to those reported at termination, we before the stopping rue (4.4) is activated or the maximum number of orace cas is reached. To some extent, this is quite norma for bunde methods, as they have to generate enough information in order to recognize optimaity of the current point. For exampe, even starting with x 0 = x, about 20 orace cas were required in our experiments before the method stopped according to (4.4). But in some cases, even when the vaues in the stopping test (4.4) are aready quite cose to the required toerances reativey eary, it proves difficut to get more precision and satisfy (4.4). As aready stated, we beieve that the QP sover used in our impementation is ikey the main reason we are not abe to progress to higher accuracy with respect to stopping test (4.4). In any case, we beieve that Tabe 4.1 shows reasonabe behavior of Agorithm 2.1, even in our simpe impementation, on probems with compementarity constraints (which is a difficut cass of probems). Finay, we observe that degeneracy of the matrix Q defining compementarity constraints is not a probem for our agorithm at a. Actuay, probems with higher degeneracy of Q appear even easier to sove. We conjecture that the reason for this is that, in the case of high degeneracy of Q, the feasibe set of (4.1) is arger and the function f 2 is easier to minimize. This may make the overa probem easier to dea with in our setting. 5. Concuding remarks. We have presented a bunde method for soving a nonsmooth convex bieve probem, which incudes standard nonsmooth constrained optimization as a specia case. The attractive feature of the method is that it is competey expicit. In particuar, it does not require an iterative soution (not even approximate) of any optimization subprobems with genera structure. Moreover, in the case of optimization, no constraint quaifications are required for convergence. Acknowedgment. The author thanks Caudia Sagastizába for her MATLAB unconstrained bunde code, which served as the basis for the impementation of Agorithm 2.1. REFERENCES [1] A. Ausender, Numerica methods for nondifferentiabe convex optimization, Math. Program. Stud., 30 (1987), pp [2] D. P. Bertsekas, Convex Anaysis and Optimization, Athena Scientific, Bemont, MA, [3] J. F. Bonnans, J. Ch. Gibert, C. Lemarécha, and C. Sagastizába, Numerica Optimization: Theoretica and Practica Aspects, Springer-Verag, Berin, [4] A. Cabot, Proxima point agorithm controed by a sowy vanishing term: Appications to hierarchica minimization, SIAM J. Optim., 15 (2005), pp [5] R. Cominetti and M. Courdurier, Couping genera penaty schemes for convex programming with the steepest descent and the proxima point agorithm, SIAM J. Optim., 13 (2002), pp [6] R. Correa and C. Lemarécha, Convergence of some agorithms for convex minimization, Math. Program., 62 (1993), pp [7] M. Kočvara and J. V. Outrata, Optimization probems with equiibrium constraints and their numerica soution, Math. Program., 101 (2004), pp [8] A. V. Fiacco and G. P. McCormick, Noninear Programming: Sequentia Unconstrained Minimization Techniques, John Wiey & Sons, New York, 1968.

18 BUNDLE METHOD FOR BILEVEL NONSMOOTH MINIMIZATION 259 [9] R. Fetcher and S. Leyffer, A Bunde Fiter Method for Nonsmooth Noninear Optimization, Numerica Anaysis Report NA/195, Department of Mathematcs, The University of Dundee, Scotand, [10] A. Frangioni, Soving semidefinite quadratic optimization probems within nonsmooth optimization probems, Comput. Oper. Res., 23 (1996), pp [11] J.-B. Hiriart-Urruty and C. Lemarécha, Convex Anaysis and Minimization Agorithms, Springer-Verag, Berin, [12] V. V. Kaashnikov and N. I. Kaashnikova, Soving two-eve variationa inequaity. Hierarchica and bieve programming, J. Goba Optim., 8 (1996), pp [13] E. Karas, A. Ribeiro, C. Sagastizába, and M. Soodov, A bunde-fiter method for nonsmooth convex constrained optimization, Math. Program., to appear. [14] K. C. Kiwie, An exact penaty function agorithm for nonsmooth convex constrained minimization probems, IMA J. Numer. Ana., 5 (1985), pp [15] K. C. Kiwie, Methods of Descent for Nondifferentiabe Optimization, Lecture Notes in Math. 1133, Springer-Verag, Berin, [16] K. C. Kiwie, A method for soving certain quadratic programming probems arising in nonsmooth optimization, IMA J. Numer. Ana., 6 (1986), pp [17] K. C. Kiwie, A constraint inearization method for nondifferentiabe convex minimization, Numer. Math., 51 (1987), pp [18] K. C. Kiwie, Exact penaty functions in proxima bunde methods for constrained convex nondifferentiabe minimization, Math. Programming, 52 (1991), pp [19] C. Lemarécha, A. Nemirovskii, and Yu. Nesterov, New variants of bunde methods, Math. Programming, 69 (1995), pp [20] Z.-Q. Luo, J.-S. Pang, and D. Raph, Mathematica Programs with Equiibrium Constraints, Cambridge University Press, Cambridge, UK, [21] R. Miffin, An agorithm for constrained optimization with semismooth functions, Math. Oper. Res., 2 (1977), pp [22] R. Miffin, A modification and extension of Lemarecha s agorithm for nonsmooth minimization, Math. Programming Stud., 17 (1982), pp [23] B. T. Poyak, Introduction to Optimization, Optimization Software, Inc., Pubications Division, New York, [24] C. Sagastizába and M. Soodov, An infeasibe bunde method for nonsmooth convex constrained optimization without a penaty function or a fiter, SIAM J. Optim., 16 (2005), pp [25] M. V. Soodov, An expicit descent method for bieve convex optimization, J. Convex Ana., 14 (2007), pp

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,