arxiv: v3 [math.oc] 25 Nov 2015
|
|
- Ella Janis Cooper
- 5 years ago
- Views:
Transcription
1 arxiv: v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine whether the origin belongs to a polytope generated by a finite set of points. When the origin is in the interior of the polytope, the algorithm generates a sequence of points in the polytope that converges linearly to zero. The algorithm s rate of convergence depends on the radius of the largest ball around the origin contained in the polytope. We show that under the weaker condition that the origin is in the polytope, possibly on its boundary, a variant of the von Neumann algorithm that includes away steps generates a sequence of points in the polytope that converges linearly to zero. The new algorithm s rate of convergence depends on a certain geometric parameter of the polytope that extends the above radius but is always positive. Our linear convergence result and geometric insights also extend to a variant of the Frank-Wolfe algorithm with away steps for minimizing a convex quadratic function over a polytope. Tepper School of Business, Carnegie Mellon University, USA, jfp@andrew.cmu.edu Department of Mathematical Sciences, Carnegie Mellon University, USA, drod@cmu.edu College of Business Administration, University of Illinois at Chicago, USA, nazad@uic.edu 1
2 1 Introduction Assume A = [ ] a 1 a n R m n with a i = 1, i = 1,...,n. The von Neumann algorithm, communicated by von Neumann to Dantzig in the late 1940s and discussed later by Dantzig in an unpublished manuscript [7], is a simple algorithm to solve the feasibility problem: Is 0 conv(a) = conv{a 1,...,a n }? More precisely, the algorithm aims to find an approximate solution to the problem Ax = 0, x n 1 = {x R n + : x 1 = 1}. (1) The algorithm starts from an arbitrary point x 0 n 1. At the k-th iteration the algorithm updates the current trial solution x k n 1 as follows. First, it finds the column a j of A that forms the widest angle with y k := Ax k. If this angle is acute, i.e., A T y k > 0, then the algorithm halts as the vector y k separates the origin from conv(a). Otherwise the algorithm chooses x k+1 n 1 so that y k+1 := Ax k+1 is the minimum-norm convex combination of Ax k and a j. Let e j n 1 denote the n-dimensional vector with j-th component equal to one and all other components equal to zero. To ease notation, we shall write for throughout the paper. Von Neumann Algorithm 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0.. for k = 0,1,,... if A T y k > 0 then HALT: 0 conv(a) j := argmin a i,y k ; i=1,...,n θ k := argmin θ [0,1] { y k +θ(a j y k ) }; x k+1 := (1 θ k )x k +θ k e j ; y k+1 := Ax k+1 ; end for The von Neumann algorithm can be seen as a kind of coordinatedescent method for finding a solution to (1): At each iteration the algorithm judiciously selects a coordinate j and increases the weight ofthe j-th componentofx k while decreasingallofthe othersvia alinesearch step. Like other currently popular coordinate-descent and firstorder methods for convex optimization, the main attractive features of the von Neumann algorithm are its simplicity and low computational cost per iteration. Another attractive feature is its convergence rate. Epelman and Freund [8] showed that the speed of convergence of the
3 von Neumann algorithm can be characterized in terms of the following condition measure of the matrix A: ρ(a) := z R m, z =1 min a i,z. () i=1,...,n The condition measure ρ(a) was introduced by Goffin [13] and later independently studied by Cheung and Cucker [4]. The latter set of authors showed that ρ(a) is also a certain distance to ill-posedness in the spirit introduced and developed by Renegar [1, ]. Observe that ρ(a) can also be written as ρ(a) = z R m, z =1 min A T z,v = v n 1 z R m, z =1 min z,av. (3) v n 1 Hence ρ(a) > 0 if and only if 0 conv(a) and ρ(a) < 0 if and only if 0 int(conv(a)). When ρ(a) > 0, this condition measure is closely related to the concept of margin in binary classification [5] and with the minimum enclosing ball problem in computational geometry [6]. The quantity ρ(a) also has the following geometric interpretation as discussed in [3, Proposition 6.8]. If ρ(a) > 0 then from (3) and Lagrangian duality we get ρ(a) = z R m, z 1 = min min z,av v n 1 z,av v n 1 z R m, z 1 = min{ y : y conv(a)} = dist(0, conv(a)). (4) On the other hand, if ρ(a) 0 then (3) yields ρ(a) = ρ(a) = min z,av z R m, z =1v n 1 = {r : y r y conv(a)} = dist(0, conv(a)). (5) In either case ρ(a) = dist(0, conv(a)). Furthermore, observe that under the assumption A = [ a 1 a n ] R m n with a i = 1, i = 1,...,n it follows that z,av 1 for all z R m, z = 1 and v n 1. In particular, from (3) it follows that ρ(a) 1. Epelman and Freund [8] showed the following properties of the von Neumann algorithm. When ρ(a) < 0 the algorithm generates iterates x k n 1, k = 1,,... such that Ax k ( 1 ρ(a) ) k Ax0. (6) On the other hand, the iterates x k n 1 also satisfy Ax k 1 k as long as the algorithm has not halted. In particular, if ρ(a) > 0 3
4 then by (4) the algorithm must halt with a certificate of infeasibility A T 1 y k > 0 for 0 conv(a) in at most ρ(a) iterations. The latter bound is identical to a classical convergence bound for the perceptron algorithm [, 0]. This is not a coincidence as there is a nice duality between the perceptron and the von Neumann algorithms [19, 3]. We show that a variant of the von Neumann algorithm with away steps has the following stronger convergence properties. When 0 conv(a), possibly on its boundary, the algorithm generates a sequence x k n 1, k = 1,,... satisfying Ax k ) k/ (1 w(a) Ax 0. (7) 16 The quantity w(a) is a kind of relative width of conv(a) that is at least as large as ρ(a). However, unlike ρ(a) the relative width w(a) is positive for any non-zero matrix A R m n provided 0 conv(a). When ρ(a) > 0, or equivalently 0 conv(a), the von Neumann algorithm with away steps finds a certificate of infeasibility A T y k > 0 for 8 0 conv(a) in at most ρ(a) iterations. Figure 1 illustrates the different behavior of the von Neumann algorithmand [ ] the variant with awaysteps described in Section for A = The figure depicts the path of iterates {y k : k = 0,1,...} generated by each algorithm starting from y 0 = [ 1 0]. The zig-zagging behavior in the first case occurs because after the third iteration the search direction is nearly perpendicular to the current iterate and as a consequence the algorithm makes slow progress. By contrast, in the second case the away steps provide alternative search directions that enable the algorithm to make faster progress. The von Neumann algorithm can be seen as a special case of the Frank-Wolfe (also known as conditional gradient) algorithm [9, 16]. The von Neumann algorithm is also nearly identical to an algorithm for minimizing a quadratic form over a convex set independently developed by Gilbert [1]. The name Gilbert s algorithm appears to be more popular in the computational geometry literature [11]. We show that a linear convergence result similar to (7) also holds for a version of the Frank-Wolfe algorithm with away steps for minimizing a strongly convex quadratic function over a polytope. This variant of the Frank-Wolfe algorithm with away steps was introduced by Wolfe [6] and has been subsequently studied by various authors. In particular, linear convergence results similar to ours have been previously established in [10, 15, 16, 17] and more recently in [1]. Linear convergence results in the same spirit also hold for the randomized Kaczmarz algorithm[4] and for the methods of randomized coordinate 4
5 Figure 1: Iterates generated by the von Neumann algorithm (top) and its variant with away steps (bottom) descent and iterated projections [18]. The computational article [14] also reports numerical experiments for variants of the von Neumann algorithm with away steps. Our main contributions are the succinct and transparent proofs of linear convergence results that highlight the role of the relative width w(a) and a closely related restricted width φ(a). Our presentation unveils a deep connection between problem conditioning as encompassed by the quantities w(a), φ(a) and the behavior of the von Neumann and Frank-Wolfe algorithms with away steps. We also provide some lower bounds on w(a) and φ(a) in terms of certain radii quantities that naturally extend ρ(a). We note that the linear convergence results in [17] are stated in terms of a certain pyramidal width whose geometric intuition and properties appear to be less understood than those of w(a) and φ(a). The rest of the paper is organized as follows. In Section we describe a von Neumann Algorithm with Away Steps and establish its main convergence result in terms of the relative width w(a). Section 3 extends our main result to the more general problem of minimizing a quadratic function over the polytope conv(a). Finally, Section 4 discusses some properties of the relative and restricted widths. 5
6 Von Neumann Algorithm with Away Steps Throughout this section we assume A = [ ] a 1 a n R m n with a i = 1, i = 1,...,n. We next consider a variant of the von Neumann Algorithm that includes so-called away steps. To that end, at each iteration, in addition to a regular step the algorithm considers an alternative away step. Each of these away steps identifies a coordinate l such that the l-th component of x k is positive and decreases the weight of the l-th component of x k. The algorithm needs to keep track of the support, that is, the set of positive entries of a vector. To that end, given x R n +, let the support of x be defined as S(x) := {i {1,...,n} : x i > 0}. Von Neumann Algorithm with Away Steps 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0;.. for k = 0,1,,... if A T y k > 0 then HALT: 0 conv(a) j := argmin a i,y k ; l := arg a i,y k ; i=1,...,n i S(x k ) if a j y k,y k < y k a l,y k then (regular step) a := a j y k ; u := e j x k ; θ := 1 else (away step) a := y k a l ; u := x k e l ; θ := (x k) l 1 (x k ) l endif θ k := argmin { y k +θa }; θ [0,θ ] x k+1 := x k +θ k u; y k+1 := Ax k+1 end for Note that the above von Neumann Algorithm with Away Steps can also be applied to any non-zero matrix A = [ a 1 a n ]. The assumption that the columns of A are normalized, i.e., a i = 1, i = 1,...,n, simplifies our notation and exposition. In Section 3 below we extend our discussion to the case when the columns of A are not necessarily normalized. Observe that the iterates x k,k = 0,1,..., generated by the above von Neumann Algorithm with Away Steps satisfy x k n 1. This fact follows by induction: By construction, x 0 n 1. At iteration k we have x k+1 = x k +θ k u where x k n 1 and the components of u add up to zeroas uis either e j x k ore l x k. The bound θ k [0,θ ] in turn guarantees that x k+1 0 and so x k+1 1 = x k 1 = 1. 6
7 Define the relative width w(a) of conv(a) as { } w(a) := min Ax,al a j : l S(x), j {1,...,n}. x 0,Ax 0 l,j Ax (8) The next proposition shows that w(a) ρ(a) when 0 conv(a). To that end, observe that w(a) can also be written as { } Ax,Au Av w(a) = min : u,v n 1,S(u) S(x). x 0,Ax 0 u,v Ax (9) Proposition 1 If A is such that 0 conv(a) then w(a) ρ(a). Proof: Since 0 conv(a), equation (3) yields In particular, ρ(a) = z R m, z =1 min z,av 0. v n 1 ρ(a) = Hence from (9) we get min z, Av z R m, z =1v n 1 min Ax, Av. x 0,Ax 0v n 1 Ax (10) w(a) min Ax, Av ρ(a). x 0,Ax 0v n 1 Ax The first inequality holds because we can choose u = x x 1 in (9). The second inequality follows from (10). Observe that under the assumption A = [ ] a 1 a n R m n with a i = 1, i = 1,...,n it follows that Au Av for all u,v n 1. In particular, from (9) it follows that w(a). In Section 4 below we discuss some additional properties of w(a). In particular,wewillformallyprovethatw(a) > 0foranynonzeromatrix A R m n such that 0 conv(a). We are now ready to state the main properties of the von Neumann algorithm with away steps. Theorem 1 Assume x 0 n 1 is one of the extreme points of n 1. (a) If 0 conv(a) then the iterates x k n 1,y k = Ax k, k = 1,,... generated by the von Neumann Algorithm with Away Steps satisfy y k ) k/ (1 w(a) y
8 (b) The iterates x k n 1,y k = Ax k, k = 1,,... generated by the von Neumann Algorithm with Away Steps also satisfy y k 8 k as long as the algorithm has not halted. In particular, if 0 conv(a) then the von Neumann Algorithm with Away Steps finds a certificate of infeasibility A T y k > 0 for 0 conv(a) in at most 8 ρ(a) iterations. The crux of the proof of Theorem 1 is the following elementary lemma. Lemma 1 Assume a,y R m satisfy a,y < 0. Then min θ 0 y+θa = y a,y a, and the minimum is attained at θ = a,y a. Proof of Theorem 1: (a) The algorithm generates y k+1 by solving a problem of the form y k+1 = min θ [0,θ ] y k +θa where a = a j y k or a = y k a l is chosen so that a,y k = min{ y k a l,y k, a j y k,y k }. In particular, a,y k 1 ( a l y k,y k + y k a j,y k ) = 1 a l a j,y k (11) 1 w(a) y k. If θ k < θ then Lemma 1 applied to y := y k yields y k+1 = y k a,y k a y k w(a) 16 y k. The second inequality follows from (11) and a 1+ y k. Thus each time the algorithm performs an iteration with θ k < θ, the value of y k decreasesat least by the factor 1 w(a) 16. Toconclude, itsufficestoshowthataftern iterationsthenumber ofiterationswithθ k < θ isatleastn/. Tothatend, weapply the following argument from [17]: Observe that when θ k = θ 8
9 wehave S(x k+1 ) < S(x k ). On the other hand, when θ k < θ wehave S(x k+1 ) S(x k ) +1. Since S(x 0 ) = 1 and S(x) 1 for every x n 1, after any number of iterations there must have been at least as many iterations with θ k < θ as there have been iterations with θ k = θ. Hence after N iterations, the number of iterations with θ k < θ is at least N/. (b) Proceed as above but note that if the algorithm does not halt at the k-th iteration then a,y k a j y k,y k y k. Thus each time the algorithm performs an iteration with θ k < θ, we have y k+1 y k a,y k a y k y k 4 4. (1) Assume the algorithm has not halted after N iterations. Let m be the number of iterations with θ k < θ up to iteration N. If y N 4 m and θ N < θ then from (1) we get y N+1 4 m 4 m = 4(m 1) m 4 m+1. It follows by induction that if the algorithm has not halted after N iterations then y N 4 m. As in part (a), it must be the case that m N and consequently y N 8 N. Finally, if 0 conv(a) then ρ(a) = min{ y : y conv(a)} > 0 and so the algorithm must halt with a certificate of infeasibility A T y k > 0 8 for 0 conv(a) after at most ρ(a) iterations. 3 Frank-Wolfe Algorithm with Away Steps Throughout this section assume A = [ a 1 a n ] R m n is a nonzero matrix, and f(y) = 1 y,qy + b,y for a symmetric positive definite matrix Q R m m and b R m. Consider the problem min f(y) min f(ax). (13) y conv(a) x n 1 Observe that in contrast to Section, we do not assume that the columns of A are normalized in this section. Problem (1) can be seen as a special case of (13) when Q = I and b = 0. The von Neumann Algorithm can also be seen as a special case of the Frank-Wolfe Algorithm [9] for (13). This section extends the ideas and results from Section to the following variant of the Frank-Wolfe algorithm with away steps. This variant can be traced 9
10 back to Wolfe [6]. It has been a subject of study in a number of papers [1, 10, 14, 15, 17]. Frank-Wolfe Algorithm with Away Steps 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0;.. for k = 0,1,,... j := argmin i=1,...,n a i, f(y k ) ; l := arg i S(x k ) a i, f(y k ) ; if a j y k, f(y k ) < y k a l, f(y k ) then (regular step) a := a j y k ; u := e j x k ; θ := 1 else (away step) a := y k a l ; u := x k e l ; θ := (x k) l 1 (x k ) l endif θ k := argmin f(y k +θa) θ [0,θ ] x k+1 := x k +θ k u; y k+1 := Ax k+1 end for Observe that the computation of θ k in the second to last step reduces to minimizing a one-dimensional convex quadratic function over the interval [0,θ ]. WenextpresentageneralversionofTheorem1fortheaboveFrank- Wolfe Algorithm with Away Steps. The linear convergence result depends on a certain restricted width and diameter defined as follows. For x 0 with Ax 0 let φ(a,x) := { sup λ > 0 : u,v n 1, S(u) S(x), Au Av = λ } Ax Ax. Define the restricted width φ(a) and diameter d(a) of conv(a) as follows. φ(a) := min{φ(a,x) : x 0, Ax 0}, (14) x and Observe that for x 0 with Ax 0 { Ax,Au Av φ(a,x) u,v Ax d(a) := x,u n 1 Ax Au. (15) } : u,v n 1,S(u) S(x). Thus (9) and (14) imply that w(a) φ(a) for all nonzero A R m n. Furthermore, the restricted width φ(a) can be seen as an extension 10
11 of the radius ρ(a) defined in (). Indeed, when 0 int(conv(a)), we have span(a) = R m. Hence (5) can alternatively be written as { ρ(a) := min λ : v n 1, Av = λ } x 0,Ax 0 Ax Ax. This implies that φ(a,x) ρ(a) + Ax x 1 for all x 0 with Ax 0. Hence the following inequality readily follows φ(a) ρ(a). Section 4 presents a stronger lower bound on φ(a) in terms of certain variants of ρ(a). In particular, we will show that φ(a) > 0, and consequently w(a) > 0, for any nonzero matrix A R m n such that 0 conv(a). The linear convergence property of the von Neumann algorithm with away steps, as stated in Theorem 1(a), extends as follows. Theorem Assume x n 1 is a minimizer of (13). Let y = Ax and Ā := Q1/[ a 1 y a n y ]. If x 0 n 1 is one of the extreme points of n 1 then the iterates x k n 1,y k = Ax k, k = 1,,... generated by the Frank-Wolfe Algorithm with Away Steps satisfy ) k/ f(y k ) f(y ) (1 φ(ā) (f(y 0 ) f(y )). (16) 4d(Ā) The proof of Theorem relies on the following two lemmas. The first one is similar to Lemma 1 and also follows via a straightforward calculation. Lemma Assume f is as above and a,y R m satisfy a, f(y) < 0. Then a, f(y) minf(y +θa) = f(y) θ 0 a,qa, and the minimum is attained at θ = a, f(y) a,qa. Lemma 3 Assume f,a,y,ā all x n 1 are as in Theorem above. Then for l S(x),j=1,...,n f(ax),a l a j φ(ā) (f(ax) f(y )). Proof: Let y := Ax conv(a). Assume y y as otherwise there is nothing to show. Since y minimizes (13), we have f(y ),y y 0. For ease of notation put δ := f(y ),y y and y y Q := y y,q(y y ). It readily follows that f(y),y y = y y Q +δ 0, 11
12 and Hence, (f(y) f(y )) = y y Q +δ. f(y),y y = ( y y Q +δ) y y Q ( y y Q +δ) = y y Q (f(y) f(y )). Thus f(y),y y y y Q (f(y) f(y )). (17) On the other hand, by the definition of φ(a) there exist u,v n 1 λ with S(u) S(x) and λ φ(ā) such that Āu Āv = Āx Āx. Since Āx = Q 1/ (Ax y ) = Q 1/ (y y ), the latter equation can be rewritten as λ Au Av = y y (y y ). (18) Q Putting (17) and (18) together we get f(y),au Av = λ f(y),y y y y Q φ(ā) (f(y) f(y )). To finish, observe that f(ax),a l a j f(y),au Av l S(x),j=1,...,n φ(ā) (f(ax) f(y )). Proof of Theorem : This is a modification of the proof of Theorem 1(a). At iteration k the algorithm yields y k+1 such that f(y k+1 ) = min θ [0,θ ] f(y k +θa) where a = a j y k or a = y k a l, and f(y k ),a > 1 f(y k),a l a j 1 φ(ā) (f(y k ) f(y ). The second inequality above follows from Lemma 3. If θ k < θ then Lemma applied to y := y k yields f(y k+1 ) = f(y k ) a, f(y k) a,qa f(y k ) φ(ā) 4d(Ā)(f(y k) f(y )). 1
13 That is, ) f(y k+1 ) f(y ) (1 φ(ā) (f(y k ) f(y )). 4d(Ā) Then proceeding as in the last part of the proof of Theorem 1(a) we obtain (16). Remark 1 A closer look at the proof of Theorem reveals that the convergence bound (16) can be sharpened as follows: Replace φ(ā) with w f (A) φ(ā), where w f(a) is the following extension of w(a): w f (A) := min x n 1 Ax y l,j { } f(ax),a l a j (f(ax) f(y )) : l S(x),j {1,...,n}. In the special case when Q = I,b = 0 problem (13) specializes to problem (1). In this case if 0 conv(a) then we have y = 0 and w f (A) = w(a). Hence the sharpened version of Theorem yields y k ) k/ (1 w(a) 4d(A) y 0. If in addition the columns of A are normalized then d(a) and we recover the bound in Theorem 1(a). We have the following related conjecture concerning w(a) and φ(a). Conjecture 1 If A R m n is non-zero and 0 conv(a) then φ(a) = w(a). 4 Some properties of the restricted width Throughout this section assume A R m n is a nonzero matrix. As we noted in Section 3 above, w(a) φ(a) and φ(a) ρ(a) when 0 int(conv(a)). Our next result establishes a stronger lower bound on φ(a) in terms of some quantities that generalize ρ(a) to the case when 0 conv(a). To that end, we recall some terminology and results from [5]. Assume A = [ ] a 1 a n R m n is a non-zero matrix. Then there exists a unique partition B N = {1,...,n} such that both A B x B = 0, x B > 0 and A T N y > 0, AT By = 0 are feasible. In particular, B if and only if 0 conv(a). Also N if and only if 0 relint(conv(a)). Furthermore, if a i = 0 then i B. The above canonical partition (B,N) allows us to refine the quantity ρ(a) defined by () as follows. Let L := span(a B ) and L := {v 13
14 R m : v,y = 0 for all y L}. By convention, L = {0} and L = R m when B =. If L {0}, let ρ B (A) be defined as ρ B (A) := z L, z =1 min i B a i,z. Observe that if B, then L = {0} only when a i = 0 for all i B. If N, let ρ N (A) be defined as ρ N (A) := z L, z =1 min a i,z. i N When L {0}, it can be shown [5] that ρ B (A) < 0. Likewise, when N it can be shown that ρ N (A) > 0. In particular, the latter implies that ρ N (A) := min a i,z = min a i,z, (19) i N i N z L, z =1 z L, z 1 where a i is the orthogonal projection of a i onto L. Let A N denote the matrix obtained by projecting each of the columns of A N onto L. From (19) and Lagrangian duality it follows that ρ N (A) = min{ y : y conv(a N)}. (0) Similarly, it can be shown that if L {0} then ρ B (A) = {r : y L, y r y conv(a B )}. (1) Observe that (0) and (1) nicely extend (4) and (5). Indeed, (0) is identical to (4) when B =. Likewise, (1) is identical to (5) when N = and rank(a) = m. Furthermore, (0) and (1) imply that ρ N (A) = dist(0, conv(a N )) and ρ B(A) = dist L (0, conv(a B )) thereby extending the fact that ρ(a) = dist(0, conv(a)). The next results show that φ(a) can be bounded below in terms of ρ B (A) and ρ N (A). In particular, Corollary 1 shows that w(a) φ(a) > 0 whenever A 0 and 0 conv(a). Theorem 3 Assume A = [ a 1 a n ] R m n is a nonzero matrix. (a) If N = then L {0} and φ(a) ρ B (A). (b) If B = then φ(ā) ρ N(A) for Ā := [ A 0 ]. (c) If B and L = {0} then φ(a) ρ N (A). (d) If N and L {0} then φ(a) ρ B(A) ρ N (A) where A +ρ N (A), A = a i. i=1,...,n 14
15 Proof: (a) Assume x 0 is such that y := Ax 0. In this case y span(a B ) = L. Hence L {0} and by (1) there exists v n 1 and r ρ B (A) such that Av = r Ax Ax. Thus for u := x x 1 we have u,v n 1, S(u) S(x) and Au Av = ( ) r+ Ax 1 x 1 Ax Ax.Itfollowsthatφ(A,x) r+ Ax x 1 > ρ B (A). [ x (b) Assume x := 0 is such that y := Ā x = Ax 0. From (0) t] [ x ] it follows that Ax x 1 ρ N (A). Thus for u := x 1, v := e n+1 0 Ax 1 we have u,v n 1, S(u) S( x) and Āu Āv = x 1 Ax Ax. Ax It follows that φ(ā, x) x 1 ρ N (A). (c) Since B and L = {0}, it follows that A B = 0 and the columns of A N are precisely the non-zero columns of A. Thus from part (b) we get φ ([ A N 0 ]) ρ N (A). To finish, observe that φ(a) = φ( [ A N 0 ] ) because A B = 0. (d) Assume x 0 is such that y := Ax 0. Let L := span(a B ) and decompose y = y L + y where y = A N x N L and y L = A B x B + (A N A N )x N L. Put r := y y [0,1]. Assume r > 0 as otherwise y = y L span(a B ) and the statement holds with the better bound φ(a) ρ B (A) by proceeding exactly as in part (a). Since r > 0, we have x N 0. Put r N := y x N 1. From (0) it follows that r N ρ N (A). Next, ( 1 put w := x (AN N 1 A N )x ) N y L. Observe that w a i a i + y L A + r N 1 r and w L. Hence i N x N 1 r by (1) there exists x B 0, x B 1 = 1 such that A B x B = cw, where c := Taking x N := c x N 1 x N we get A N x N A B x B = ρ B (A) r r A +r N 1 r (0,1). c x N 1 (y +y L ) = ρ B (A) r N r A +r N 1 r y y. Thus letting u := (1 c)x + (0, x N ), v = ( x B,0) we get u,v n 1, S(u) S(x) and ( ) ρ B (A) r N Ax Au Av = (1 c) Ax + r A +r N 1 r Ax. () 15
16 Next, observe that ρ B (A) r N (1 c) Ax + r A +r N 1 r ρ B (A) r N r A +r N 1 r ρ B (A) r N A +rn ρ B (A) ρ N (A) A +ρ N (A). (3) The first inequality above follows because c (0,1), the second one follows from r [0,1] ( r A +r N 1 r ) = A +r N, andthethirdonefollowsfromr N ρ N (A). Putting()and(3) together we get φ(a,x) ρ B(A) ρ N (A) A +ρ N (A). Corollary 1 Assume A = [ a 1 a n ] R m n is a nonzero matrix and 0 conv(a). Then w(a) φ(a) > 0. Proof: Apply Theorem 3. Since 0 conv(a), we have B and thus case(b) cannot occur. Ifcase(a) occursthen φ(a) ρ B (A) > 0since ρ B (A) < 0 as L {0}. If case (c) occurs then φ(a) ρ N (A) > 0. Finally, if case (d) occurs then φ(a) ρ B(A) ρ N (A) > 0, since A +ρ N (A) both ρ N (A) > 0 and ρ B (A) < 0 as L {0}. To finish, recall that w(a) φ(a) as established in Section 3. We conclude with a few small examples that illustrate the values of φ(a), ρ B (A),ρ N (A) and their connection with the bounds in Theorem 3 for the three possible cases: N =, B =, and both B,N. Example 1 Assume ǫ,δ (0,1) and let [ ] A =. ǫ ǫ ǫ ǫ ǫδ ǫδ In this case B = {1,,3,4,5,6}, N =. It is easy to see that ρ B (A) = ǫ and φ(a) = φ(a, x) = (1+δ)ǫ for x = [ / 1/ ] T. [ ] 1 1 Example Assume δ (0,1) and let A =. In this case δ δ B =, N = {1,}. It is easy to see that ρ N (A) = δ and if we put Ā = [ A 0 ] then φ(ā) = φ(ā, x) = δ for x = [ 1/ 1/ 0 ] T. 16
17 Example 3 Assume ǫ,δ (0,1) and let A = ǫ ǫ ǫ ǫ δ δ In this case B = {1,,3,4}, N = {5,6}. It is easy to see that [ ] T 1 1 ǫ 0 0 (1+ǫ) (1+ǫ) 0 1+ǫ ρ B (A) = ǫ, ρ N (A) = δ. For x = we get A x = 0 0 ǫδ 1+ǫ It thus follows that φ(a) φ(a, x) = ǫδ 1+ǫ. On the other hand, Theorem 3 implies that in this case φ(a) ǫδ (1+ǫ,1+δ )+δ. In particular, ǫδ < φ(a) < ǫδ. Acknowledgements We are grateful to Simon Lacoste-Julien and Martin Jaggi for their comments on a preliminary draft of this paper. We are also grateful to two anonymous referees for their numerous constructive suggestions. The first author s research has been supported by NSF grant CMMI References [1] A. Beck and S. Shtern. Linearly convergent away-step conditional gradient for non-strongly convex functions. Technical report, Faculty of Industrial Engineering and Management, Technion, [] H. D. Block. The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:13 135, 196. [3] P. Bürgisser and F. Cucker. Condition. Springer Berlin Heidelberg, 013. [4] D. Cheung and F. Cucker. A new condition number for linear programming. Math. Prog., 91(): , 001. [5] D. Cheung, F. Cucker, and J. Peña. On strata of degenerate polyhedral cones I: Condition and distance to strata. Eur. J. Oper. Res., 19(198):3 8, 009. [6] K. Clarkson. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63,
18 [7] G.B. Dantzig. An ǫ-precise feasible solution to a linear program with a convexity constraint in 1 ǫ iterations independent of problem size. Technical report, Stanford University, 199. [8] M. Epelman and R. M. Freund. Condition number complexity of an elementary algorithm for computing a reliable solution of a conic linear system. Math. Program., 88(3): , 000. [9] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Quarterly, 3:95 110, [10] D. Garber and E. Hazan. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. Technical report, Technical report, Faculty of Industrial Engineering and Management, Technion, 013. [11] B. Gärtner and M. Jaggi. Coresets for polytope distance. In SCG 09 Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 33 4, 009. [1] E. Gilbert. An iterative procedure for computing the minimum of a quadratic form on a convex set. SIAM Journal on Control, 4(1):61 80, [13] J. Goffin. The relaxation method for solving systems of linear inequalities. Math. Oper. Res., 5: , [14] J. Gonçalves, R. Storer, and J. Gondzio. A family of linear programming algorithms based on an algorithm by von Neumann. Optimization Methods and Software, 4(3): , 009. [15] J. Guélat and P. Marcotte. Some comments on Wolfe s away step. Math. Program., 35: , [16] M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In ICML, volume 8 of JMLR Proceedings, pages , 013. [17] S. Lacoste-Julien and M. Jaggi. On the global linear convergence of Frank-Wolfe optimization variants. In Advances in Neural Information Processing Systems (NIPS), 015. [18] D. Leventhal and A. Lewis. Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res., 35: , 010. [19] D. Li and T. Terlaky. The duality between the perceptron algorithm and the von Neumann algorithm. In Modeling and Optimization: Theory and Applications (MOPTA) Conference, 013. [0] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615 6,
19 [1] J. Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM J. on Optim., 5:506 54, [] J. Renegar. Linear programming, complexity theory and elementary functional analysis. Math. Program., 70:79 351, [3] N. Soheili and J. Peña. A primal dual smooth perceptron von Neumann algorithm. In K. Bezdek, Y. Ye, and A. Deza, editors, Fields Institute Communications Series on Discrete Geometry and Optimization, volume 69, pages Springer, 013. [4] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15:6 5, 009. [5] V. Vapnik. Statistical Learning Theory. Wiley, [6] P. Wolfe. Convergence theory in nonlinear programming. In Integer and Nonlinear Programming. North-Holland, Amsterdam,
On the von Neumann and Frank-Wolfe Algorithms with Away Steps
On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine
More informationSome preconditioners for systems of linear inequalities
Some preconditioners for systems of linear inequalities Javier Peña Vera oshchina Negar Soheili June 0, 03 Abstract We show that a combination of two simple preprocessing steps would generally improve
More informationA Deterministic Rescaled Perceptron Algorithm
A Deterministic Rescaled Perceptron Algorithm Javier Peña Negar Soheili June 5, 03 Abstract The perceptron algorithm is a simple iterative procedure for finding a point in a convex cone. At each iteration,
More informationPolytope conditioning and linear convergence of the Frank-Wolfe algorithm
Polytope conditioning and linear convergence of the Frank-Wolfe algorithm Javier Peña Daniel Rodríguez December 24, 206 Abstract It is known that the gradient descent algorithm converges linearly when
More informationTowards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins
Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins arxiv:1406.5311v1 [math.oc] 20 Jun 2014 Aaditya Ramdas Machine Learning Department Carnegie Mellon University aramdas@cs.cmu.edu
More informationA data-independent distance to infeasibility for linear conic systems
A data-independent distance to infeasibility for linear conic systems Javier Peña Vera Roshchina May 29, 2018 Abstract We offer a unified treatment of distinct measures of well-posedness for homogeneous
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationAn algorithm to compute the Hoffman constant of a system of linear constraints
An algorithm to compute the Hoffman constant of a system of linear constraints arxiv:804.0848v [math.oc] 23 Apr 208 Javier Peña Juan Vera Luis F. Zuluaga April 24, 208 Abstract We propose a combinatorial
More informationA Polynomial Column-wise Rescaling von Neumann Algorithm
A Polynomial Column-wise Rescaling von Neumann Algorithm Dan Li Department of Industrial and Systems Engineering, Lehigh University, USA Cornelis Roos Department of Information Systems and Algorithms,
More informationSolving Conic Systems via Projection and Rescaling
Solving Conic Systems via Projection and Rescaling Javier Peña Negar Soheili December 26, 2016 Abstract We propose a simple projection and rescaling algorithm to solve the feasibility problem find x L
More informationPairwise Away Steps for the Frank-Wolfe Algorithm
Pairwise Away Steps for the Frank-Wolfe Algorithm Héctor Allende Department of Informatics Universidad Federico Santa María, Chile hallende@inf.utfsm.cl Ricardo Ñanculef Department of Informatics Universidad
More informationConvex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationAn Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms
An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms Dan Li and Tamás Terlaky Department of Industrial and Systems Engineering, Lehigh University, USA ISE Technical
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationJournal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures
Journal of Complexity 26 (200) 209 226 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco On strata of degenerate polyhedral cones, II: Relations
More informationA Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility
A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on
More informationLimited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives
Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song
More informationarxiv: v1 [math.oc] 10 Oct 2018
8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University
More informationRescaling Algorithms for Linear Programming Part I: Conic feasibility
Rescaling Algorithms for Linear Programming Part I: Conic feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk Abstract We propose simple polynomial-time
More informationChapter 1. Preliminaries
Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between
More informationMixed-integer nonlinear programming, Conic programming, Duality, Cutting
A STRONG DUAL FOR CONIC MIXED-INTEGER PROGRAMS DIEGO A. MORÁN R., SANTANU S. DEY, AND JUAN PABLO VIELMA Abstract. Mixed-integer conic programming is a generalization of mixed-integer linear programming.
More information1 Introduction and preliminaries
Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr
More informationLecture 5. Theorems of Alternatives and Self-Dual Embedding
IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationStrong Dual for Conic Mixed-Integer Programs
Strong Dual for Conic Mixed-Integer Programs Diego A. Morán R. Santanu S. Dey Juan Pablo Vielma July 14, 011 Abstract Mixed-integer conic programming is a generalization of mixed-integer linear programming.
More informationDISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of
repper SCHOOL OF BUSINESS DISSERTATION Submitted in partial fulfillment ofthe requirements for the degree of DOCTOR OF PHILOSOPHY INDUSTRIAL ADMINISTRATION (OPERATIONS RESEARCH) Titled "ELEMENTARY ALGORITHMS
More informationUniqueness of the Solutions of Some Completion Problems
Uniqueness of the Solutions of Some Completion Problems Chi-Kwong Li and Tom Milligan Abstract We determine the conditions for uniqueness of the solutions of several completion problems including the positive
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationA full-newton step infeasible interior-point algorithm for linear programming based on a kernel function
A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function Zhongyi Liu, Wenyu Sun Abstract This paper proposes an infeasible interior-point algorithm with
More informationA Proximal Method for Identifying Active Manifolds
A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationSparse Optimization Lecture: Dual Certificate in l 1 Minimization
Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization
More informationConditioning of linear-quadratic two-stage stochastic programming problems
Conditioning of linear-quadratic two-stage stochastic programming problems W. Römisch Humboldt-University Berlin Institute of Mathematics http://www.math.hu-berlin.de/~romisch (K. Emich, R. Henrion) (WIAS
More informationRobust Farkas Lemma for Uncertain Linear Systems with Applications
Robust Farkas Lemma for Uncertain Linear Systems with Applications V. Jeyakumar and G. Li Revised Version: July 8, 2010 Abstract We present a robust Farkas lemma, which provides a new generalization of
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationMcMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.
McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationA strongly polynomial algorithm for linear systems having a binary solution
A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationSpeeding up Chubanov s Basic Procedure
Speeding up Chubanov s Basic Procedure Kees Roos, Delft University of Technology, c.roos@tudelft.nl September 18, 2014 Abstract It is shown that a recently proposed method by Chubanov for solving linear
More informationA Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials
A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials G. Y. Li Communicated by Harold P. Benson Abstract The minimax theorem for a convex-concave bifunction is a fundamental theorem
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More information1. Introduction and background. Consider the primal-dual linear programs (LPs)
SIAM J. OPIM. Vol. 9, No. 1, pp. 207 216 c 1998 Society for Industrial and Applied Mathematics ON HE DIMENSION OF HE SE OF RIM PERURBAIONS FOR OPIMAL PARIION INVARIANCE HARVEY J. REENBER, ALLEN. HOLDER,
More informationDuality revisited. Javier Peña Convex Optimization /36-725
Duality revisited Javier Peña Conve Optimization 10-725/36-725 1 Last time: barrier method Main idea: approimate the problem f() + I C () with the barrier problem f() + 1 t φ() tf() + φ() where t > 0 and
More informationPrimal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization
Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department
More informationA Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature
A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature Alexandre Belloni and Robert M. Freund April, 007 Abstract For a conic linear system of the form Ax K, K a convex
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationA Redundant Klee-Minty Construction with All the Redundant Constraints Touching the Feasible Region
A Redundant Klee-Minty Construction with All the Redundant Constraints Touching the Feasible Region Eissa Nematollahi Tamás Terlaky January 5, 2008 Abstract By introducing some redundant Klee-Minty constructions,
More informationA Parametric Simplex Algorithm for Linear Vector Optimization Problems
A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear
More informationOn the order of the operators in the Douglas Rachford algorithm
On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums
More informationOn the exponential convergence of. the Kaczmarz algorithm
On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationA Simple Derivation of a Facial Reduction Algorithm and Extended Dual Systems
A Simple Derivation of a Facial Reduction Algorithm and Extended Dual Systems Gábor Pataki gabor@unc.edu Dept. of Statistics and OR University of North Carolina at Chapel Hill Abstract The Facial Reduction
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationOn Sublinear Inequalities for Mixed Integer Conic Programs
Noname manuscript No. (will be inserted by the editor) On Sublinear Inequalities for Mixed Integer Conic Programs Fatma Kılınç-Karzan Daniel E. Steffy Submitted: December 2014; Revised: July 2015 Abstract
More informationA Second Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization
A Second Full-Newton Step On Infeasible Interior-Point Algorithm for Linear Optimization H. Mansouri C. Roos August 1, 005 July 1, 005 Department of Electrical Engineering, Mathematics and Computer Science,
More informationSome Results Concerning Uniqueness of Triangle Sequences
Some Results Concerning Uniqueness of Triangle Sequences T. Cheslack-Postava A. Diesl M. Lepinski A. Schuyler August 12 1999 Abstract In this paper we will begin by reviewing the triangle iteration. We
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationWe describe the generalization of Hazan s algorithm for symmetric programming
ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,
More informationPrimal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization
Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization Robert M. Freund M.I.T. June, 2010 from papers in SIOPT, Mathematics
More informationarxiv: v1 [cs.cc] 5 Dec 2018
Consistency for 0 1 Programming Danial Davarnia 1 and J. N. Hooker 2 1 Iowa state University davarnia@iastate.edu 2 Carnegie Mellon University jh38@andrew.cmu.edu arxiv:1812.02215v1 [cs.cc] 5 Dec 2018
More informationarxiv: v1 [math.oc] 21 Mar 2015
Convex KKM maps, monotone operators and Minty variational inequalities arxiv:1503.06363v1 [math.oc] 21 Mar 2015 Marc Lassonde Université des Antilles, 97159 Pointe à Pitre, France E-mail: marc.lassonde@univ-ag.fr
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationAn Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization
An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,
More information1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016
AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 7 February 7th Overview In the previous lectures we saw applications of duality to game theory and later to learning theory. In this lecture
More informationTHE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS
THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS ADI BEN-ISRAEL AND YURI LEVIN Abstract. The Newton Bracketing method [9] for the minimization of convex
More informationABELIAN SELF-COMMUTATORS IN FINITE FACTORS
ABELIAN SELF-COMMUTATORS IN FINITE FACTORS GABRIEL NAGY Abstract. An abelian self-commutator in a C*-algebra A is an element of the form A = X X XX, with X A, such that X X and XX commute. It is shown
More informationSpring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization
Spring 2017 CO 250 Course Notes TABLE OF CONTENTS richardwu.ca CO 250 Course Notes Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4, 2018 Table
More informationLecture 1: Basic Concepts
ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as
More informationNew stopping criteria for detecting infeasibility in conic optimization
Optimization Letters manuscript No. (will be inserted by the editor) New stopping criteria for detecting infeasibility in conic optimization Imre Pólik Tamás Terlaky Received: March 21, 2008/ Accepted:
More informationYinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method
The Simplex Method Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters 2.3-2.5, 3.1-3.4) 1 Geometry of Linear
More informationGLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH
GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics
More informationStability and Robustness of Weak Orthogonal Matching Pursuits
Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery
More informationarxiv: v1 [math.pr] 22 May 2008
THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered
More informationA Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases
2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOptimisation in Higher Dimensions
CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained
More informationLecture 5. The Dual Cone and Dual Problem
IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES
U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 3, 2018 ISSN 1223-7027 ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES Vahid Dadashi 1 In this paper, we introduce a hybrid projection algorithm for a countable
More informationStrong Duality: Without Simplex and without theorems of alternatives. Somdeb Lahiri SPM, PDPU. November 29, 2015.
Strong Duality: Without Simplex and without theorems of alternatives By Somdeb Lahiri SPM, PDPU. email: somdeb.lahiri@yahoo.co.in November 29, 2015. Abstract We provide an alternative proof of the strong
More informationSolution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions
Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,
More informationOn the Properties of Positive Spanning Sets and Positive Bases
Noname manuscript No. (will be inserted by the editor) On the Properties of Positive Spanning Sets and Positive Bases Rommel G. Regis Received: May 30, 2015 / Accepted: date Abstract The concepts of positive
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationWe first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix
BIT 39(1), pp. 143 151, 1999 ILL-CONDITIONEDNESS NEEDS NOT BE COMPONENTWISE NEAR TO ILL-POSEDNESS FOR LEAST SQUARES PROBLEMS SIEGFRIED M. RUMP Abstract. The condition number of a problem measures the sensitivity
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationStrong duality in Lasserre s hierarchy for polynomial optimization
Strong duality in Lasserre s hierarchy for polynomial optimization arxiv:1405.7334v1 [math.oc] 28 May 2014 Cédric Josz 1,2, Didier Henrion 3,4,5 Draft of January 24, 2018 Abstract A polynomial optimization
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More information